HLV_KG

This repository provides a tool to turn DWUG EN: Diachronic Word Usage Graphs for English (Schlechtweg et al. 2024) into a knowledge graph to visualize and query the variation of the annotations in the dataset.

Usage

Environment

The needed packages are stored in environment.yml. Please create a conda environment with the following command:

conda env create -f environment.yml

Structure

dwug_en

This folder contains the dataset DWUG EN: Diachronic Word Usage Graphs for English (Schlechtweg et al. 2021). Please click on the link to see the documentation of the dataset.

graphs

This folder contains the turtle files of the created knowledge graphs.

dwug_en.ttl

The full knowledge graph of the dataset DWUG EN: Diachronic Word Usage Graphs for English (Schlechtweg et al. 2021).

test_dwug_en.ttl

A small sample of three words of the dataset DWUG EN: Diachronic Word Usage Graphs for English (Schlechtweg et al. 2021) in turtle format for testing purposes.

query_results

This folder contains the results for the executed SPARQL-SELECT queries.

annotator

This folder contains the annotator queries.

variation

This folder contains the variation queries.

resources

This folder contains the assigned positions and colors of the nodes as json-files. Please node that this folder is not pushed to GitHub due to file size limitations.

full_graph_pos.json

This file contains the positions of the nodes for the full knowledge graph.

color_dict_{color_mode}.json

This file contains the colors of the nodes for the full knowledge graph. The colors depend on the color_mode. If the mode is distinct, the colors define the number of distinct categories. If the mode is range, the colors define the range of distinct categories.

visualizations

This folder contains the created visualizations.

annotator

This folder contains the created visualizations per annotator.

full

This folder contains the created visualizations for the full graph. The positions of the nodes are re-scaled. Each ring of nodes represents one category.

first ring (= the closest to the center): category 0 = Undecidable = No annotation possible
second ring: category 1 = Unrelated = Homonymy
third ring: category 2 = Distantly Related = Polysemy
fourth ring: category 3 = Closely Related = Context Variance
fifth ring: category 4 = Identical = Identity

instance

This folder contains the created visualizations for one word pair. It resembles the structure of the knowledge graph.

create_kg.py

This script creates the RDF-graph from the csv-files in the dataset DWUG EN: Diachronic Word Usage Graphs for English (Schlechtweg et al. 2024).

The knowledge graph has the following nodes and relations:
dataset: The dataset node which collects meta information about the dataset and connects all words to each other.
word: The word nodes which collect information about the token occurence. Each word is connected to its reference sentence and the annotation it occurs in.
sentence: The sentence nodes which collect information about the occurence. Each sentence is connected to a word.
annotation: The annotation nodes which collect information about the annotation. Each annotation is connected to two annotated words and its annotator.
annotator: The annotator nodes which collect information about the annotators. Each annotator is connected to the annotations they have annotated.

The knowledge graph relies mainly on the classes and properties on the NIF 2.0 Core Ontology which has been built for NLP tools, resources and annotations. The RDA namespace is for missing properties and classes from NIF (e.g. annotators). The dataset node is defined as a Dataset object of schema.org.

explore_data.py

This script entails functions that query the original dataset, e.g. extracting the words from the dataset that are annotated by all annotators.

main.py

This script creates the data stored in the folders graphs, query_results, and visualizations. It can be seen as an example pipeline for the provided scripts.

query_kg.py

This script contains the SPARQL queries to parse the RDF-graph. Available queries are:

category_stats: How often has a label been annotated?
annotations_per_annotator: Which annotations has a annotator done?
num_labels: How many distinct labels has a annotation and how much do they differ from each other? This query is used to create the annotator and full graph visualizations.
filter_variation: This query is a more refined version of num_labels, because one can decide how high the range of the number of distinct labels should be.
get_pos_tags: Which POS-tags are used in the dataset?

visualize_kg.py

This script creates the visualizations of the RDF-graph on three different levels. Available visualizations are:

instance: A visualization of the annotations of a word pair via RDF Grapher
annotator: A visualization of the annotaions of one annotator via NetworkX
full: A visualization of all annotations in the graph via NetworkX

References

Schlechtweg, D. and Dubossarsky, H. and Hengchen, S. and McGillivray, B. and Tahmasebi, N. 2024. DWUG EN: Diachronic Word Usage Graphs for English (3.0.0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HLV_KG

Usage

Environment

Structure

dwug_en

graphs

dwug_en.ttl

test_dwug_en.ttl

query_results

annotator

variation

resources

full_graph_pos.json

color_dict_{color_mode}.json

visualizations

annotator

full

instance

create_kg.py

explore_data.py

main.py

query_kg.py

visualize_kg.py

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
__pycache__		__pycache__
dwug_en		dwug_en
graphs		graphs
query_results		query_results
visualizations		visualizations
.gitignore		.gitignore
README.md		README.md
create_kg.py		create_kg.py
data_stats.txt		data_stats.txt
environment.yml		environment.yml
explore_data.py		explore_data.py
main.py		main.py
query_kg.py		query_kg.py
visualize_kg.py		visualize_kg.py

Folders and files

Latest commit

History

Repository files navigation

HLV_KG

Usage

Environment

Structure

dwug_en

graphs

dwug_en.ttl

test_dwug_en.ttl

query_results

annotator

variation

resources

full_graph_pos.json

color_dict_{color_mode}.json

visualizations

annotator

full

instance

create_kg.py

explore_data.py

main.py

query_kg.py

visualize_kg.py

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages