Skip to content

nmrenyi/double-triangle-annotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

367 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Double Triangle Annotation

Goal

The goal of this double triangle annotation is to provide precise annotation, combining the power of LLM or other automatic labelling models and human. In order to reduce human effort while ensuring the quality of LLM labeling, we design this double triangle annotation paradigm.

Structure

First Layer Triangle

There are two machine annotators and one human jury in this part. The two machine annotators, which could be LLM or traditional OCR, offer the label for our data. Then we automatically compare the results of those two annotators. The human jury will check the data points which have different annotations from two annotators, and only sample a small part of data points with the same annotation.

The rational behind this design is that, when two models are both strong and independent, we believe that they would label most data correctly, and when they make mistakes, the probablity that they make same mistake (giving the same wrong label) on the same data point is fairly low. This is effectively reducing the human labelling efforts, cost and decreasing the human labelling errors due to fatigue, as human only need to check the data points with different labels and a small part of data points with identical label, which in total should be an acceptable amount for the attention of human annotator.

Since the two machine annotators (M1 and M2) offers their separate and independent labels for the human jury (H) to check, it looks like a triangle. For the convenience of later discussion, we call the triangle, which consists of two machine annotators and one human jury, a system (S).

Second Layer Triangle

The annotation generated by the system risks the human subjectivity of the human jury H. In order to control the subjectivity of one jury, we are introducing the second triangular structure, with two systems and one final reviewer (R).

First we have two systems (S1, S2), providing their independent high-quality labels (L1, L2). L1 and L2 will be then submitted to the final reviewer (R), which examines the difference in L1 and L2, solve the difference manually and generates the final label, or "golden truth" (G).

In the second layer triangle, we still have human as the final reviewer, so the human subjectivity is not completely eliminated. However, with this second layer mechanism, we believe that we could have more precise labels than only using the first layer triangle.

Caution

We made a filtering here, <= 70 fields to be corrected, > 0.7 IAA_Field, > 0.6 IAA_Character, in order to make the human correction easier. However, this filtering may make the evluation set has more short lists than the original sampled ones.

Implementation

First Layer Triangle

In the first layer triangle, we need to choose the machine annotator. Here we want to choose two strong and independent models for each system, and the models between systems should be different, to ensure the independence between systems.

For the model, we have two choices

  1. Use strong multimodal LLM. (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, Grok 4, DeepSeek OCR, Qwen3-Max) Be careful of the supported languages of the model
  2. Use traditional OCR / commercial OCR + only-text strong LLM for correction. (Google OCR, Microsoft OCR)

In our real world case, we use these two pairs:

  • Claude Sonnet 4.5 vs Qwen 3 VL 235B
  • Llama 4 Maverick vs Grok 4 0709.

Second Layer Triangle

Choose an experienced domain expert as the final reviewer. In my case, the researcher himself.

More Information

For more detailed information, see the full report:

About

Precise and efficient data annotation paradigm with LLM and human collaboration

Resources

Stars

Watchers

Forks

Contributors