Skip to content

Mismatch in Predicted Labels and ignored_labels=['O'] Not Working #151

@valentinaosetrov

Description

@valentinaosetrov

For my token classification task, the predicted labels from the explainer differ from those obtained using the pipeline. Some tokens that should have a label are instead predicted as 'O' by the explainer. Even after setting ignored_labels=['O'], these tokens are still included in the visualization (with only the true 'O' tokens being excluded) and continue to be displayed as 'O' in the visual.

pipeline prediction :

[{'end': 50,
  'entity_group': 'OrderAndDelivery',
  'score': 0.91371477,
  'start': 4,
  'word': 'colis a été marqué comme livré alors que je ne'},
 {'end': 62,
  'entity_group': 'OrderAndDelivery',
  'score': 0.6080048,
  'start': 56,
  'word': 'jamais'}]

Explainer :

image

Has anyone else experienced this issue or found a solution?
Below is the code:

config = AutoConfig.from_pretrained('models/ner_model_camembert_v7')
max_length=120
model = AutoModelForTokenClassification.from_pretrained('models/ner_model_camembert_v7', config=config)
tokenizer = AutoTokenizer.from_pretrained('models/ner_model_camembert_v7', config=config, truncation=True, return_offsets_mapping=True, padding="max_length", max_length=max_length)

model.eval()

ner_explainer = TokenClassificationExplainer(
    model,
    tokenizer
)

sample_text = "Mon colis a été marqué comme livré alors que je ne l ai jamais reçu"

word_attributions = ner_explainer(sample_text, ignored_labels=['O'])

pipe = pipeline("token-classification", model=model, aggregation_strategy="simple", tokenizer=tokenizer)
output_model = pipe(sample_text)
pprint(output_model)

ner_explainer.visualize()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions