Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...Yes. When building production RAG pipelines in Haystack, the pipeline typically ends with a Generator component (like OpenAIGenerator). While Haystack does a great job retrieving context, there is no native, deterministic post-processing component to audit the generated answer for hallucinations before returning it to the user.
Currently, if I want to guarantee that every sentence in the generated output is grounded in the retrieved Documents, I have to build a custom external loop to re-evaluate the output against the source context, which breaks the clean, linear flow of a Haystack Pipeline. Additionally, standard DocumentSplitter components often severe context by splitting paragraphs away from their Markdown headers strictly based on character/word counts.
]
Describe the solution you'd like
I would love to see two new components added to the Haystack ecosystem:
-
FactChecker (or VerificationEvaluator) Component:
A new Pipeline component designed to sit immediately after a Generator. It takes two inputs: the generated replies (String) and the documents (List[Document]). It uses a secondary LLM strictly as a judge to cross-reference the reply against the documents, effectively scoring it or stripping out unsupported claims. It would output a VerifiedReply and a ConfidenceScore.
-
HeaderAwareDocumentSplitter Component:
An enhancement or alternative to DocumentSplitter that parses Markdown/HTML AST. It refuses to split a chunk if it separates a heading (#, ##) from its immediate paragraph, ensuring that retrieved chunks retain their structural context.
Describe alternatives you've considered
I have considered using Haystack's AnswerBuilder with reference_pattern, but that simply matches regex citations, it doesn't deterministically verify if the LLM hallucinated the claim in the first place.
I have also considered using offline evaluation frameworks (like DeepEval or Ragas), but those are meant for testing datasets during development. I need a runtime component that actively blocks or warns users about unverified answers within the live production Pipeline. Currently, I am forced to write a Custom Component to handle this.
Additional context
I recently built a custom extraction pipeline in raw Python solving this exact problem, utilizing a multi-model approach (a fast 8B model for routing, and a 70B model for the final verification stage).
I am highly motivated to bring this pattern to Haystack. I am willing to write the PR, unit tests, and documentation for either the FactChecker component or the HeaderAwareDocumentSplitter if the core maintainers believe this aligns with Haystack's vision for production-ready RAG.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...Yes. When building production RAG pipelines in Haystack, the pipeline typically ends with a
Generatorcomponent (likeOpenAIGenerator). While Haystack does a great job retrieving context, there is no native, deterministic post-processing component to audit the generated answer for hallucinations before returning it to the user.Currently, if I want to guarantee that every sentence in the generated output is grounded in the retrieved
Documents, I have to build a custom external loop to re-evaluate the output against the source context, which breaks the clean, linear flow of a Haystack Pipeline. Additionally, standardDocumentSplittercomponents often severe context by splitting paragraphs away from their Markdown headers strictly based on character/word counts.]
Describe the solution you'd like
I would love to see two new components added to the Haystack ecosystem:
FactChecker(orVerificationEvaluator) Component:A new Pipeline component designed to sit immediately after a
Generator. It takes two inputs: the generatedreplies(String) and thedocuments(List[Document]). It uses a secondary LLM strictly as a judge to cross-reference the reply against the documents, effectively scoring it or stripping out unsupported claims. It would output aVerifiedReplyand aConfidenceScore.HeaderAwareDocumentSplitterComponent:An enhancement or alternative to
DocumentSplitterthat parses Markdown/HTML AST. It refuses to split a chunk if it separates a heading (#,##) from its immediate paragraph, ensuring that retrieved chunks retain their structural context.Describe alternatives you've considered
I have considered using Haystack's
AnswerBuilderwithreference_pattern, but that simply matches regex citations, it doesn't deterministically verify if the LLM hallucinated the claim in the first place.I have also considered using offline evaluation frameworks (like
DeepEvalorRagas), but those are meant for testing datasets during development. I need a runtime component that actively blocks or warns users about unverified answers within the live production Pipeline. Currently, I am forced to write a Custom Component to handle this.Additional context
I recently built a custom extraction pipeline in raw Python solving this exact problem, utilizing a multi-model approach (a fast 8B model for routing, and a 70B model for the final verification stage).
I am highly motivated to bring this pattern to Haystack. I am willing to write the PR, unit tests, and documentation for either the
FactCheckercomponent or theHeaderAwareDocumentSplitterif the core maintainers believe this aligns with Haystack's vision for production-ready RAG.