Skip to content

Conversation

@kalaspuffar
Copy link

@kalaspuffar kalaspuffar commented Aug 18, 2025

This is a PR as per the suggestion from danny-avila/LibreChat#9102

This will add an endpoint /rerank in order to use open source models to rerank documents. The endpoint needs a query to rerank against and documents to rank. We can also add information on how many results we need, k, and a configuration to set the model and keys in order to run this operation.

All available configuration options could be found over at https://github.com/AnswerDotAI/rerankers, which this endpoint is a thin wrapper over.

Test call

curl -s http://localhost:8000/rerank \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -d '{
    "query": "I love you",
    "docs": ["I hate you", "I really like you"],
    "k": 5
  }'

Expected response:

[{"text":"I really like you","score":-1.537894606590271},{"text":"I hate you","score":-4.30911111831665}]

Realized that sending the model over the call is not the correct option, we need to load it one time to improve performance so now you can configure that in the environment for the rag_api repository.

SIMPLE_RERANKER_MODEL_NAME = "mixedbread-ai/mxbai-rerank-large-v1"
SIMPLE_RERANKER_MODEL_TYPE = "cross-encoder"
#SIMPLE_RERANKER_MODEL_NAME = "ms-marco-MiniLM-L-12-v2"
#SIMPLE_RERANKER_MODEL_NAME = "flashrank"
#SIMPLE_RERANKER_MODEL_TYPE = "colbert"
SIMPLE_RERANKER_LANG = ""
SIMPLE_RERANKER_API_PROVIDER = ""
SIMPLE_RERANKER_API_KEY = ""

@kalaspuffar
Copy link
Author

Force push was due to black linting.

All done! ✨ 🍰 ✨
1 file reformatted, 1 file left unchanged.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new /rerank endpoint to enable document reranking using open source models via the rerankers library. The implementation allows users to submit a query and a list of documents to be reranked based on relevance, with optional control over the number of top results returned.

Key Changes:

  • Added rerankers library dependencies with transformers and flashrank support
  • Implemented /rerank endpoint that accepts queries and documents for reranking
  • Configured Docker Compose with NVIDIA runtime and HuggingFace cache volume for model support

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 8 comments.

File Description
requirements.txt Added rerankers library with transformers and flashrank extras for document reranking functionality
docker-compose.yaml Added NVIDIA runtime support and HuggingFace cache volume mount to support GPU-accelerated model inference
app/routes/document_routes.py Implemented reranker instance initialization and /rerank endpoint handler with document processing logic
app/models.py Added QueryMultipleDocs Pydantic model to define request schema for the rerank endpoint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment on lines +47 to +50
class QueryMultipleDocs(BaseModel):
query: str
docs: List[str]
k: int = 4
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QueryMultipleDocs model only accepts List[str] for the docs field, but the endpoint implementation at lines 717-726 also handles dictionary objects with "text", "doc_id", and "metadata" fields. The model definition should use Union[str, dict] to accurately reflect the accepted input types:

from typing import List, Union

class QueryMultipleDocs(BaseModel):
    query: str
    docs: List[Union[str, dict]]
    k: int = 4

Copilot uses AI. Check for mistakes.
Comment on lines +712 to +743
@router.post("/rerank")
async def rerank_documents_by_query(request: Request, body: QueryMultipleDocs):
try:
docs = []
for i, d in enumerate(body.docs):
if isinstance(d, str):
docs.append(ReRankDocument(text=d, doc_id=i))
else:
docs.append(
ReRankDocument(
text=d.get("text", ""),
doc_id=d.get("doc_id", i),
metadata=d.get("metadata", {}) or {},
)
)

top_k = body.k

results = reRankerInstance.rank(query=body.query, docs=docs)
items = results.top_k(top_k) if top_k else results

return [
{"text": getattr(r.document, "text", None), "score": r.score} for r in items
]
except Exception as e:
logger.error(
"Error in reranking documents | Query: %s | Error: %s | Traceback: %s",
body.query,
str(e),
traceback.format_exc(),
)
raise HTTPException(status_code=500, detail=str(e))
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new /rerank endpoint lacks test coverage. Consider adding tests similar to other endpoints in the codebase (e.g., test_query_embeddings_by_file_id) to verify:

  • Successful reranking with string documents
  • Reranking with dict documents containing metadata
  • Handling of the k parameter (top_k filtering)
  • Error handling for invalid inputs

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +49
reRankerInstance = Reranker(
model_name=os.getenv("SIMPLE_RERANKER_MODEL_NAME"),
model_type=os.getenv("SIMPLE_RERANKER_MODEL_TYPE"),
lang=os.getenv("SIMPLE_RERANKER_LANG"),
)
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new reranker environment variables (SIMPLE_RERANKER_MODEL_NAME, SIMPLE_RERANKER_MODEL_TYPE, SIMPLE_RERANKER_LANG) are not documented in the README.md. These should be added to the "Environment Variables" section to help users configure the reranker feature properly.

Copilot uses AI. Check for mistakes.
raise HTTPException(status_code=500, detail=str(e))

@router.post("/rerank")
async def rerank_documents_by_query(request: Request, body: QueryMultipleDocs):
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /rerank endpoint is missing a docstring to describe its purpose, parameters, and return value. Other endpoints in the codebase (e.g., /text at line 746) include docstrings. Consider adding one:

@router.post("/rerank")
async def rerank_documents_by_query(request: Request, body: QueryMultipleDocs):
    """
    Rerank documents based on relevance to a query using a reranking model.
    
    Args:
        request: The FastAPI request object
        body: Contains query string, list of documents, and optional k value
        
    Returns:
        List of ranked documents with their scores
    """
Suggested change
async def rerank_documents_by_query(request: Request, body: QueryMultipleDocs):
async def rerank_documents_by_query(request: Request, body: QueryMultipleDocs):
"""
Rerank documents based on relevance to a query using a reranking model.
Args:
request: The FastAPI request object.
body: QueryMultipleDocs object containing the query string, list of documents, and optional k value.
Returns:
List of dictionaries, each containing the document text and its relevance score.
"""

Copilot uses AI. Check for mistakes.
- DB_PORT=5432
ports:
- "8000:8000"
runtime: nvidia
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime: nvidia configuration is added unconditionally, but this will cause failures on systems without NVIDIA GPUs or the NVIDIA Container Runtime. Consider:

  1. Making this conditional based on an environment variable
  2. Documenting that GPU support is required for reranking
  3. Or using a docker-compose override file for GPU-specific configurations

Example:

runtime: ${DOCKER_RUNTIME:-runc}  # defaults to runc, override with nvidia when needed
Suggested change
runtime: nvidia
runtime: ${DOCKER_RUNTIME:-runc}

Copilot uses AI. Check for mistakes.

@router.post("/rerank")
async def rerank_documents_by_query(request: Request, body: QueryMultipleDocs):
try:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint doesn't validate that the docs list is non-empty before processing. If an empty list is provided, the reranker may behave unexpectedly. Consider adding validation:

if not body.docs:
    raise HTTPException(status_code=400, detail="docs list cannot be empty")
Suggested change
try:
try:
if not body.docs:
raise HTTPException(status_code=400, detail="docs list cannot be empty")

Copilot uses AI. Check for mistakes.
from app.utils.health import is_health_ok

router = APIRouter()
reRankerInstance = Reranker(
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name reRankerInstance uses unconventional camelCase naming. Python convention (PEP 8) recommends using lowercase with underscores for variable names. Consider renaming to reranker_instance or reranker for better consistency with the codebase (e.g., vector_store, logger).

Suggested change
reRankerInstance = Reranker(
reranker_instance = Reranker(

Copilot uses AI. Check for mistakes.
Comment on lines +45 to 51
reRankerInstance = Reranker(
model_name=os.getenv("SIMPLE_RERANKER_MODEL_NAME"),
model_type=os.getenv("SIMPLE_RERANKER_MODEL_TYPE"),
lang=os.getenv("SIMPLE_RERANKER_LANG"),
)


Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Reranker instance is initialized at module import time with environment variables that may not be set or may be None. This could cause initialization failures or runtime errors when the module loads. Consider:

  1. Adding validation to ensure required environment variables are set
  2. Providing sensible defaults for optional parameters
  3. Or deferring initialization until first use with lazy loading

For example:

model_name = os.getenv("SIMPLE_RERANKER_MODEL_NAME")
if not model_name:
    raise ValueError("SIMPLE_RERANKER_MODEL_NAME environment variable must be set")
Suggested change
reRankerInstance = Reranker(
model_name=os.getenv("SIMPLE_RERANKER_MODEL_NAME"),
model_type=os.getenv("SIMPLE_RERANKER_MODEL_TYPE"),
lang=os.getenv("SIMPLE_RERANKER_LANG"),
)
@lru_cache(maxsize=1)
def get_reranker_instance():
model_name = os.getenv("SIMPLE_RERANKER_MODEL_NAME")
model_type = os.getenv("SIMPLE_RERANKER_MODEL_TYPE")
lang = os.getenv("SIMPLE_RERANKER_LANG")
if not model_name:
raise RuntimeError("SIMPLE_RERANKER_MODEL_NAME environment variable must be set")
if not model_type:
raise RuntimeError("SIMPLE_RERANKER_MODEL_TYPE environment variable must be set")
if not lang:
raise RuntimeError("SIMPLE_RERANKER_LANG environment variable must be set")
return Reranker(
model_name=model_name,
model_type=model_type,
lang=lang,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant