Skip to content

feat: add LlamaIndex integration with hybrid retriever support (#217)#274

Open
ArshadJafri wants to merge 1 commit intoyichuan-w:mainfrom
ArshadJafri:feature/llamaindex-integration
Open

feat: add LlamaIndex integration with hybrid retriever support (#217)#274
ArshadJafri wants to merge 1 commit intoyichuan-w:mainfrom
ArshadJafri:feature/llamaindex-integration

Conversation

@ArshadJafri
Copy link
Copy Markdown
Contributor

Description

This PR implements LlamaIndex integration for LEANN, requested in #217 and addressing integration blockers discovered previously.

What was built

We introduced two BaseRetriever subclasses:

  1. LeannRetriever: A seamless integration for pure semantic/vector search.
  2. LeannHybridRetriever: A native hybrid retriever leveraging LEANN's built-in BM25 capabilities natively.

Key Fixes Included

  • Parameter Safety: Accurately mapped the user-facing keyword bm25_weight (e.g., 0.3) to LEANN's underlying vector-weight parameter gemma.
  • Tuple Normalization Fix (Support LEANN in llamaindex #217): Properly intercepts SearchResult objects and normalizes their metadata into strict dictionaries before casting them to LlamaIndex NodeWithScore objects. This natively prevents crashes when users wrap this inside LlamaIndex's QueryFusionRetriever.

Testing

  • ✅ Tested bm25_weight -> gemma translation mathematics.
  • ✅ Added visually verifiable local tests under examples/llamaindex_hybrid_example.py.

@ASuresh0524
Copy link
Copy Markdown
Collaborator

LGTM @ArshadJafri

Thanks for this integration — the implementation looks good.

Verified:

  • bm25_weight → gemma mapping is correct (bm25_weight=0.3 → 70% vector, 30% keyword)
  • Metadata normalization in _results_to_nodes fixes the QueryFusionRetriever crash from Support LEANN in llamaindex #217
  • API usage: LeannSearcher.search() and gemma are used correctly
  • Dependencies are already covered via llama-index-core in leann-core

Minor suggestions (before merging):

  • Example: consider using a temp directory in examples/llamaindex_hybrid_example.py and cleaning up the index file afterward.
  • Docs: A short note in the docs about this integration would help discoverability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants