A lightweight comparative analysis of 3 modern Black-Box Hallucination Detection methods for language models, including SAC3, SelfCheckGPT, and Semantic Entropy.
As large language models become increasingly widespread, detecting hallucinations (outputs that are factually incorrect, nonsensical, or inconsistent with training data) has become a critical challenge. This project presents a focused study on black-box hallucination detection methods that do not require access to model activations or training data.
We explore and compare:
- SAC3: Sampling and Aggregation for Consistency Checking
- SelfCheckGPT: Self-consistency + factuality via prompt-based self-questioning
- Semantic Entropy: Measuring uncertainty via vector dispersion
The papers referenced in these implementations:
- SAC3: SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
- SelfCheckGPT: SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
- Semantic Entropy: Detecting hallucinations in large language models using semantic entropy
| Method | Type | Requires Internal Access? | Highlights |
|---|---|---|---|
| SAC3 | Sampling-based | ❌ | Aggregates multiple generations for consistency |
| SelfCheckGPT | Prompt-based QA | ❌ | Uses model to ask & answer factuality questions |
| Semantic Entropy | Embedding-based | ❌ | Uses dispersion in embedding space to assess confidence |
Each method was implemented with modular design, enabling future plug-and-play experimentation on different LLMs or prompts.

SelfCheckGPT asks the model to generate an answer, then follow up with a series of self-posed questions related to that answer. It uses the model itself to evaluate the factual consistency of its response by checking if the answers to those questions align with the original claim. If inconsistencies arise during self-questioning, hallucination is inferred.
SAC3 detects hallucinations by prompting the language model multiple times with variations of the same query and analyzing the consistency of its responses. If the outputs vary significantly in factual content or contradict one another, the model is likely hallucinating. This method assumes that truthful answers will remain stable across sampled generations.
Semantic Entropy estimates the uncertainty of a model’s output by measuring the dispersion of embedding vectors from multiple generations. High entropy indicates that the model is unsure about its response, which often correlates with hallucinations. It’s a purely black-box method that doesn't require labeled data or fine-tuning.
We evaluated all methods on the following datasets:
- 🔹 TruthfulQA
- 🔹 HallucinationEval (subset)
- 🔹 Synthetic prompts
Evaluation metrics:
- ✅ Precision@k
- 📉 False Positive Rate
- 🔁 Agreement Rate between detectors
How to use the selfCheckGPT method?
python selfCheckGPT.py
Then type out a sample question to test on
How to use the SAC3 method?
python SAC3.py
Then type out a sample question and a desired answer
How to use the Semantic Entropy method?
python semanticEntropy.py
Then type out a sample question and a desired answer
pip install -r requirements.txtReferences
- Farquhar, S., Kossen, J., Kuhn, L. et al. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024). https://doi.org/10.1038/s41586-024-07421-0
- Manakul, Potsawee, et al. “SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models.” arXiv.Org, 11 Oct. 2023, arxiv.org/abs/2303.08896.
- Zhang, Jiaxin, et al. “SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-Aware Cross-Check Consistency.” arXiv.Org, 18 Feb. 2024, arxiv.org/abs/2311.01740.