Skip to content

A lightweight comparative analysis of 3 modern Black-Box Hallucination Detection methods for language models, including SAC3, SelfCheckGPT, and Semantic Entropy.

Notifications You must be signed in to change notification settings

SamanSathenjeri/HallucinationDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation and Study of Black-Box Hallucination Detection Techniques

A lightweight comparative analysis of 3 modern Black-Box Hallucination Detection methods for language models, including SAC3, SelfCheckGPT, and Semantic Entropy.

🔍 Overview

As large language models become increasingly widespread, detecting hallucinations (outputs that are factually incorrect, nonsensical, or inconsistent with training data) has become a critical challenge. This project presents a focused study on black-box hallucination detection methods that do not require access to model activations or training data.

We explore and compare:

  • SAC3: Sampling and Aggregation for Consistency Checking
  • SelfCheckGPT: Self-consistency + factuality via prompt-based self-questioning
  • Semantic Entropy: Measuring uncertainty via vector dispersion

The papers referenced in these implementations:


📦 Methods Implemented

Method Type Requires Internal Access? Highlights
SAC3 Sampling-based Aggregates multiple generations for consistency
SelfCheckGPT Prompt-based QA Uses model to ask & answer factuality questions
Semantic Entropy Embedding-based Uses dispersion in embedding space to assess confidence

Each method was implemented with modular design, enabling future plug-and-play experimentation on different LLMs or prompts.


SELFCHECKGPT

SelfCheckGPT concept
SelfCheckGPT asks the model to generate an answer, then follow up with a series of self-posed questions related to that answer. It uses the model itself to evaluate the factual consistency of its response by checking if the answers to those questions align with the original claim. If inconsistencies arise during self-questioning, hallucination is inferred.

SAC3

SAC3 concept SAC3 detects hallucinations by prompting the language model multiple times with variations of the same query and analyzing the consistency of its responses. If the outputs vary significantly in factual content or contradict one another, the model is likely hallucinating. This method assumes that truthful answers will remain stable across sampled generations.

Semantic Entropy

Semantic Entropy concept Semantic Entropy estimates the uncertainty of a model’s output by measuring the dispersion of embedding vectors from multiple generations. High entropy indicates that the model is unsure about its response, which often correlates with hallucinations. It’s a purely black-box method that doesn't require labeled data or fine-tuning.


📊 Results

We evaluated all methods on the following datasets:

  • 🔹 TruthfulQA
  • 🔹 HallucinationEval (subset)
  • 🔹 Synthetic prompts

Evaluation metrics:

  • ✅ Precision@k
  • 📉 False Positive Rate
  • 🔁 Agreement Rate between detectors

🛠️ Usage

How to use the selfCheckGPT method?

python selfCheckGPT.py

Then type out a sample question to test on

How to use the SAC3 method?

python SAC3.py

Then type out a sample question and a desired answer

How to use the Semantic Entropy method?

python semanticEntropy.py

Then type out a sample question and a desired answer

Requirements

pip install -r requirements.txt

References

  • Farquhar, S., Kossen, J., Kuhn, L. et al. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024). https://doi.org/10.1038/s41586-024-07421-0
  • Manakul, Potsawee, et al. “SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models.” arXiv.Org, 11 Oct. 2023, arxiv.org/abs/2303.08896.
  • Zhang, Jiaxin, et al. “SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-Aware Cross-Check Consistency.” arXiv.Org, 18 Feb. 2024, arxiv.org/abs/2311.01740.

About

A lightweight comparative analysis of 3 modern Black-Box Hallucination Detection methods for language models, including SAC3, SelfCheckGPT, and Semantic Entropy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages