ReaRAG

Knowledge-guided Reasoning with Iterative Retrieval Augmented Generation

📋 Table of Contents

Overview
Features
Architecture
Installation
Usage
Extending ReaRAG
Differences from Original Paper
License
Citation

Overview

ReaRAG is a factuality-enhanced reasoning model that combines strong reasoning capabilities with retrieval augmentation. It follows the Thought-Action-Observation paradigm:

Thought : The model generates reasoning steps
Action : The model decides whether to search for more information or finish and provide an answer
Observation : If a search action is chosen, external knowledge is retrieved to guide further reasoning

This implementation focuses on the algorithm and methodology proposed in the paper, allowing users to use any base LLM for reasoning.

Features

Modular Design - Flexible integration with various LLM providers (OpenAI, Anthropic, Hugging Face, etc.)
Multiple Retrieval Sources - Support for web search, vector databases, and simple in-memory retrieval
Configurable Parameters - Adjust maximum iterations and other settings to your needs
Detailed Reasoning Trace - Full visibility into the reasoning process for explainability
Extensible Architecture - Easy to add new LLM providers and retrievers

Architecture

The implementation consists of these main components:

Core ReaRAG Engine: Implements the iterative reasoning algorithm
LLM Provider Interface: Abstract interface for different LLM providers
Retriever Interface: Abstract interface for different retrieval sources
Prompt Templates: Templates for generating prompts for the LLM

Installation

# Clone the repository
git clone https://github.com/llmsresearch/rearag.git
cd rearag

# Install dependencies
pip install -r requirements.txt

Usage

Basic Example

from rearag.core.rearag import ReaRAG
from rearag.interfaces.llm_provider import OpenAIProvider
from rearag.interfaces.retriever import SimpleRetriever

# Initialize LLM provider and retriever
llm_provider = OpenAIProvider(model_name="gpt-4-turbo-preview", api_key="YOUR_API_KEY")
retriever = SimpleRetriever(knowledge_base={"who is the prime minister of the uk": "Rishi Sunak is the Prime Minister of the United Kingdom."})

# Initialize ReaRAG
rearag = ReaRAG(
    llm_provider=llm_provider,
    retriever=retriever,
    max_iterations=5,
    verbose=True
)

# Answer a question
result = rearag.answer_question("Who is the current Prime Minister of the UK?")
print(f"Answer: {result['answer']}")

# The reasoning trace is also available
print(result['reasoning_trace'])

Using the Example Script

The repository includes an example script (example.py) that demonstrates how to use ReaRAG with different LLM providers and retrievers:

# Use with default settings (OpenAI LLM and simple retriever)
python example.py --question "What is the capital of France?"

# Use with Anthropic Claude and web search retriever
python example.py --llm anthropic --retriever web --question "Who won the last World Cup?"

# Interactive mode
python example.py --verbose

Extending ReaRAG

Adding a New LLM Provider

Create a new class that inherits from LLMProvider
Implement the generate and get_model_name methods
Register the provider in the initialize_llm_provider function in example.py

class CustomLLMProvider(LLMProvider):
    def __init__(self, model_name, **kwargs):
        self.model_name = model_name
        # Initialize your custom LLM client
        
    def generate(self, prompt):
        # Implement your generation logic
        return response
        
    def get_model_name(self):
        return self.model_name

Adding a New Retriever

Create a new class that inherits from Retriever
Implement the retrieve method
Register the retriever in the initialize_retriever function in example.py

class CustomRetriever(Retriever):
    def __init__(self, **kwargs):
        # Initialize your custom retriever
        
    def retrieve(self, query):
        # Implement your retrieval logic
        return results

Differences from the Original Paper

This implementation focuses on the core methodology of the paper without:

The fine-tuning process on knowledge-guided reasoning chain data
The specific evaluation benchmarks used in the paper (MuSiQue, HotpotQA, IIRC, NQ)
The data construction and filtering procedures

The implementation follows the inference-time algorithm described in the paper, allowing users to plug in any base LLM.

License

MIT License

Citation

If you use this code for your research, please cite the original paper:

@misc{lee2024rearag,
      title={ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation}, 
      author={Zhicheng Lee and Shulin Cao and Jinxin Liu and Jiajie Zhang and Weichuan Liu and Xiaoyin Che and Lei Hou and Juanzi Li},
      year={2024},
      eprint={2503.21729},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
rearag		rearag
LICENSE		LICENSE
README.md		README.md
example.py		example.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReaRAG

📋 Table of Contents

Overview

Features

Architecture

Installation

Usage

Basic Example

Using the Example Script

Extending ReaRAG

Adding a New LLM Provider

Adding a New Retriever

Differences from the Original Paper

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

llmsresearch/rearag

Folders and files

Latest commit

History

Repository files navigation

ReaRAG

📋 Table of Contents

Overview

Features

Architecture

Installation

Usage

Basic Example

Using the Example Script

Extending ReaRAG

Adding a New LLM Provider

Adding a New Retriever

Differences from the Original Paper

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages