Feature: Post-RAG FactChecker Pipeline Component & Header-Aware Document Splitter

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...Yes. When building production RAG pipelines in Haystack, the pipeline typically ends with a `Generator` component (like `OpenAIGenerator`). While Haystack does a great job retrieving context, there is no native, deterministic post-processing component to audit the generated answer for hallucinations before returning it to the user.

Currently, if I want to guarantee that every sentence in the generated output is grounded in the retrieved `Documents`, I have to build a custom external loop to re-evaluate the output against the source context, which breaks the clean, linear flow of a Haystack [Pipeline](cci:2://file:///d:/2025/mini-project_3rd_yr-main/mp1/pluto/pipeline.py:31:0-218:9). Additionally, standard `DocumentSplitter` components often severe context by splitting paragraphs away from their Markdown headers strictly based on character/word counts. 
]

**Describe the solution you'd like**
I would love to see two new components added to the Haystack ecosystem:

1. **`FactChecker` (or `VerificationEvaluator`) Component:**
   A new Pipeline component designed to sit immediately after a `Generator`. It takes two inputs: the generated `replies` (String) and the `documents` (List[Document]). It uses a secondary LLM strictly as a judge to cross-reference the reply against the documents, effectively scoring it or stripping out unsupported claims. It would output a `VerifiedReply` and a `ConfidenceScore`.

2. **`HeaderAwareDocumentSplitter` Component:**
   An enhancement or alternative to `DocumentSplitter` that parses Markdown/HTML AST. It refuses to split a chunk if it separates a heading (`#`, `##`) from its immediate paragraph, ensuring that retrieved chunks retain their structural context.



**Describe alternatives you've considered**
I have considered using Haystack's `AnswerBuilder` with `reference_pattern`, but that simply matches regex citations, it doesn't deterministically verify if the LLM hallucinated the claim in the first place. 

I have also considered using offline evaluation frameworks (like `DeepEval` or `Ragas`), but those are meant for testing datasets during development. I need a *runtime* component that actively blocks or warns users about unverified answers within the live production Pipeline. Currently, I am forced to write a Custom Component to handle this.


**Additional context**
I recently built a custom extraction pipeline in raw Python solving this exact problem, utilizing a multi-model approach (a fast 8B model for routing, and a 70B model for the final verification stage). 

I am highly motivated to bring this pattern to Haystack. I am willing to write the PR, unit tests, and documentation for either the `FactChecker` component or the `HeaderAwareDocumentSplitter` if the core maintainers believe this aligns with Haystack's vision for production-ready RAG.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Post-RAG FactChecker Pipeline Component & Header-Aware Document Splitter #10973

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: Post-RAG FactChecker Pipeline Component & Header-Aware Document Splitter #10973

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions