Implementation and Study of Black-Box Hallucination Detection Techniques

A lightweight comparative analysis of 3 modern Black-Box Hallucination Detection methods for language models, including SAC3, SelfCheckGPT, and Semantic Entropy.

🔍 Overview

As large language models become increasingly widespread, detecting hallucinations (outputs that are factually incorrect, nonsensical, or inconsistent with training data) has become a critical challenge. This project presents a focused study on black-box hallucination detection methods that do not require access to model activations or training data.

We explore and compare:

SAC3: Sampling and Aggregation for Consistency Checking
SelfCheckGPT: Self-consistency + factuality via prompt-based self-questioning
Semantic Entropy: Measuring uncertainty via vector dispersion

The papers referenced in these implementations:

SAC3: SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
SelfCheckGPT: SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Semantic Entropy: Detecting hallucinations in large language models using semantic entropy

📦 Methods Implemented

Method	Type	Requires Internal Access?	Highlights
SAC3	Sampling-based	❌	Aggregates multiple generations for consistency
SelfCheckGPT	Prompt-based QA	❌	Uses model to ask & answer factuality questions
Semantic Entropy	Embedding-based	❌	Uses dispersion in embedding space to assess confidence

Each method was implemented with modular design, enabling future plug-and-play experimentation on different LLMs or prompts.

SELFCHECKGPT

SelfCheckGPT asks the model to generate an answer, then follow up with a series of self-posed questions related to that answer. It uses the model itself to evaluate the factual consistency of its response by checking if the answers to those questions align with the original claim. If inconsistencies arise during self-questioning, hallucination is inferred.

SAC3

SAC3 detects hallucinations by prompting the language model multiple times with variations of the same query and analyzing the consistency of its responses. If the outputs vary significantly in factual content or contradict one another, the model is likely hallucinating. This method assumes that truthful answers will remain stable across sampled generations.

Semantic Entropy

Semantic Entropy estimates the uncertainty of a model’s output by measuring the dispersion of embedding vectors from multiple generations. High entropy indicates that the model is unsure about its response, which often correlates with hallucinations. It’s a purely black-box method that doesn't require labeled data or fine-tuning.

📊 Results

We evaluated all methods on the following datasets:

🔹 TruthfulQA
🔹 HallucinationEval (subset)
🔹 Synthetic prompts

Evaluation metrics:

✅ Precision@k
📉 False Positive Rate
🔁 Agreement Rate between detectors

🛠️ Usage

How to use the selfCheckGPT method?

python selfCheckGPT.py

Then type out a sample question to test on

How to use the SAC3 method?

python SAC3.py

Then type out a sample question and a desired answer

How to use the Semantic Entropy method?

python semanticEntropy.py

Then type out a sample question and a desired answer

Requirements

pip install -r requirements.txt

References

Farquhar, S., Kossen, J., Kuhn, L. et al. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024). https://doi.org/10.1038/s41586-024-07421-0
Manakul, Potsawee, et al. “SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models.” arXiv.Org, 11 Oct. 2023, arxiv.org/abs/2303.08896.
Zhang, Jiaxin, et al. “SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-Aware Cross-Check Consistency.” arXiv.Org, 18 Feb. 2024, arxiv.org/abs/2311.01740.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
.gitignore		.gitignore
README.md		README.md
SAC3.py		SAC3.py
app.py		app.py
requirements.txt		requirements.txt
selfCheckGPT.py		selfCheckGPT.py
semanticEntropy.py		semanticEntropy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Implementation and Study of Black-Box Hallucination Detection Techniques

🔍 Overview

📦 Methods Implemented

SELFCHECKGPT

SAC3

Semantic Entropy

📊 Results

🛠️ Usage

Requirements

About

Uh oh!

Releases

Packages

Languages

SamanSathenjeri/HallucinationDetection

Folders and files

Latest commit

History

Repository files navigation

Implementation and Study of Black-Box Hallucination Detection Techniques

🔍 Overview

📦 Methods Implemented

SELFCHECKGPT

SAC3

Semantic Entropy

📊 Results

🛠️ Usage

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages