Skip to content

Repository for research project "Text tree edit distance: a language model-based metric for text hierarchy comparison".

Notifications You must be signed in to change notification settings

intsystems/text-tree-distance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Tree Edit Distance: A Language Model-Based Metric for Text Hierarchies.

Text trees as a data structure occur in numerous machine learning tasks like hierarchical summarization and automatic mind map generation. One of the main methods of quality evaluation in these tasks is comparison with reference hierarchies created by experts. The method used so far to compare text hierarchies, as shown in this work, poorly accounts for their structure and text semantics relative to phrasing. To address this issue, we propose a new metric on the set of text trees — text tree edit distance (TTED), based on tree edit distance with semantic distance between texts measured using a large language model. To evaluate how the metric reflects different aspects of text tree difference, we introduce special quality coefficients that reflect the sensitivity of a metric to paraphrasing relative to structural and semantic differences of text trees. Using these coefficients, we conduct extensive testing of the proposed metric and its modifications compared to a baseline used in previous works to compare text hierarchies, which shows that TTED indeed captures significant differences between text trees more accurately than the previously used method. We also provide a practical implementation of TTED for further usage.

About

Repository for research project "Text tree edit distance: a language model-based metric for text hierarchy comparison".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published