Skip to content

Huge Difference in prediction confidence score #5

@prasadautomationtesting

Description

@prasadautomationtesting

I noticed a significant difference in confidence scores between the base and large models for the same input. While I understand that model size can affect this, I'm curious about why the score difference is so substantial.

Base:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1"
)

contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."

predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)

Output:
Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9891987442970276, 'text': ' The population of France is 69 million.'}]

Large:

from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedect-large-modernbert-en-v1"
)

contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."

predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)

Output:
Predictions: [{'start': 31, 'end': 71, 'confidence': 0.7649378180503845, 'text': ' The population of France is 69 million.'}]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions