Course project for Formale Semantik (University of Heidelberg). We investigate Named Entity Recognition & Classification (NERC) under increasing label granularity (from coarse-grained up to ultra-fine entity typing).
OntoNotes: The 90% Solution (Hovy et al., NAACL 2006)
Fine-grained entity recognition (FIGER) (Ling and Weld, AAAI 2012)
Ultra-Fine Entity Typing (Choi et al., ACL 2018)
Note:
Since the three datasets differ significantly in structure, we first analyze each dataset individually.
For Ultra-Fine, we split the dataset into:
- Ultra-Fine Crowdsourced (ds_fine_crowd)
- Ultra-Fine Distantly Supervised (ds_fine_ds)
Due to their substantial differences, we effectively work with four datasets in total.
| Dataset | Task | Granularity | Multi-Label |
|---|---|---|---|
| OntoNotes | Classical NER | Coarse | No |
| FIGER | Fine-Grained Entity Typing | Fine | Yes |
| Ultra-Fine | Ultra-Fine Entity Typing | Very Fine | Yes |
The datasets vary significantly in size.
Ultra-Fine and FIGER are substantially larger than OntoNotes, while ds_fine_crowd is much smaller than ds_fine_ds.
| Dataset | Unique Labels | Multi-Label |
|---|---|---|
| OntoNotes | 4 | No |
| FIGER | ~100 | Yes |
| Ultra-Fine | 10k+ | Yes |
ds_fine_crowd has the highest number of labels per mention, closely followed by FIGER.
OntoNotes is strictly single-label, while ds_fine_ds also has relatively few labels per entity.
Both FIGER and OntoNotes contain a portion of mentions without labels.
In FIGER, all entities are single tokens, whereas in the other datasets, a significant portion of entities consists of multiple words.
OntoNotes is a benchmark dataset for classical Named Entity Recognition (NER).
| Metric | Value |
|---|---|
| Entities | 35089 |
| Unique Labels | 4 |
| Multi-word Entities | 12917 (36.81%) |
| Avg Labels/Entity | 1.00 |
| Max Labels/Entity | 1 |
- PER
- LOC
- ORG
- MISC
OntoNotes contains only four labels and is strictly single-label, making it the dataset with the lowest granularity.
FIGER extends classical NER to fine-grained entity typing. It is also the largest dataset used in this project.
| Metric | Value |
|---|---|
| Entities | 4,047,079 |
| Unique Labels | 91 |
| Multi-word Entities | 0 (0.00%) |
| Avg Labels/Entity | 4.62 |
| Max Labels/Entity | 25 |
Instead of broad categories like PERSON, FIGER introduces hierarchical labels such as:
/person/actor/person/politician/location/city/organization/company
Most entities have between 1 and 5 labels, although rare cases reach up to 25 labels.
The Ultra-Fine dataset pushes entity typing further by allowing very specific semantic descriptions.
| Metric | Value |
|---|---|
| Entities | 3,152,711 |
| Unique Labels | 4261 |
| Multi-word Entities | 1,749,718 (55.50%) |
| Avg Labels/Entity | 2.18 |
| Max Labels/Entity | 11 |
| Metric | Value |
|---|---|
| Entities | 5994 |
| Unique Labels | 2519 |
| Multi-word Entities | 3000 (50.05%) |
| Avg Labels/Entity | 5.39 |
| Max Labels/Entity | 19 |
- ds_fine_ds is significantly larger than ds_fine_crowd
- ds_fine_ds contains more total labels but fewer labels per entity on average
Labels are often natural language descriptions rather than fixed ontology entries.
Examples:
- person
- musician
- politician
- father
- skyscraper
Interresting how location in fine_ds is the Top Label, but in fine_crowd it is way less comon.
T5 (Text-to-Text Transfer Transformer) is a transformer-based model developed by Google that converts all NLP tasks into a text-to-text format.
- Span dependency:
The model must correctly identify entity spans before classification.
-
Multi-label generation:
Entities often have multiple labels → requires generating label sets. -
Hierarchical labels:
Labels require structured understanding. -
Class imbalance:
Frequent labels dominate training.
-
Extremely large label space:
Thousands of labels → difficult generalization. -
Open vocabulary labels:
Labels are natural language. -
Long-tail distribution:
Many rare labels.
- Inefficient formulation:
Too few labels for NLI to be efficient.
- Ignored label dependencies:
Hierarchy is not modeled.
- Label ambiguity:
Semantic overlap between labels.
Convert labels into a consistent format:
/person/actor→ actorfilm_actor→ actor
- Convert to lowercase
- Remove special characters (
/,_)
- Extract one label per entity span
Map labels to natural language:
- PERSON → person
- ORG → organization
Example:
Muddy Waters → ['/person/musician', '/person/actor', '/person/artist']
- Split hierarchical labels:
'/person/actor' → person, actor
- Optionally limit hierarchy depth
Example:
They → ['expert', 'scholar', 'scientist', 'person']
-
Frequency filtering
-
Remove rare labels (<10 occurrences)
-
Top-k selection
-
Keep most relevant labels per mention
Format output as:
entity → label1, label2, label3
Use:
- sorted labels
- limit number of generated labels
Example:
Status: Work in progress (this repo will evolve as experiments and structure solidify!).










