Skip to content

feat: Add Text Embeddings Inference (TEI) provider support#60

Merged
jayscambler merged 5 commits intomainfrom
jay/cfos-46-add-text-embeddings-inference-tei-support-to-embeddings
Jul 2, 2025
Merged

feat: Add Text Embeddings Inference (TEI) provider support#60
jayscambler merged 5 commits intomainfrom
jay/cfos-46-add-text-embeddings-inference-tei-support-to-embeddings

Conversation

@jayscambler
Copy link
Contributor

Summary

Adds support for Hugging Face's Text Embeddings Inference (TEI) server as an embedding provider in ContextFrame, providing high-performance, self-hosted embeddings for 100+ open-source models.

Changes

  • ✨ New TEIProvider class implementing the EmbeddingProvider interface
  • 🔧 Updated factory function to support provider_type="tei"
  • 📚 Added comprehensive documentation in embedding providers guide
  • 🧪 Unit tests with mocks for TEI functionality
  • 💡 Complete example demonstrating TEI usage patterns
  • 📦 Added httpx as optional dependency for lightweight HTTP client

Features

  • Local and Remote Support: Works with TEI servers running locally or remotely
  • Authentication: Bearer token support for secured instances
  • Health Checks: Built-in server health monitoring
  • Error Handling: Automatic retries with exponential backoff
  • Configuration: Flexible timeout, truncation, and normalization options
  • Minimal Dependencies: Only requires httpx (25KB) for HTTP communication

Example Usage

from contextframe.embed import create_embedder

# Local TEI server
embedder = create_embedder(
    model="BAAI/bge-large-en-v1.5",
    provider_type="tei",
    api_base="http://localhost:8080"
)

# Embed documents
results = embedder.embed_batch(["Document 1", "Document 2"])

Benefits

  • 🚀 Performance: Flash Attention, ONNX optimization, dynamic batching
  • 🔐 Privacy: Self-hosted solution for sensitive data
  • 🎯 Flexibility: Supports any Sentence Transformer or BERT-based model
  • 📊 Production Ready: Built-in metrics, monitoring, health checks
  • 💻 Hardware Support: GPU acceleration (CUDA) and CPU optimizations

Testing

The implementation includes comprehensive unit tests using mocks. Note: Current test suite has NumPy compatibility issues unrelated to this PR that will be addressed separately.

Related

Docker Setup

# GPU deployment
docker run --gpus all -p 8080:80 -v $PWD/data:/data \
  ghcr.io/huggingface/text-embeddings-inference:1.7 \
  --model-id BAAI/bge-large-en-v1.5

# CPU deployment
docker run -p 8080:80 -v $PWD/data:/data \
  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 \
  --model-id BAAI/bge-large-en-v1.5

- Fix pylance dependency (was incorrectly 'lance' in v0.1.1)
- Fix 38 failing integration tests with API corrections
- Add 'member_of' to valid relationship types for collections
- Fix custom metadata string validation issues
- Implement Lance v0.30.0 vector search bug workaround
- Fix UUID property access and len() usage on datasets
- Improve error messages with field context and helpful hints

feat: Add new features for better developer experience
- Add full-text search index creation (create_fts_index method)
- Add UUID override support at creation time
- Add auto-indexing option for full-text search
- Enhance create_scalar_index with index type support
- Reorganize tests into unit/ and integration/ structure

docs: Update documentation and changelog
- Add comprehensive CHANGELOG entry for v0.1.2
- Add migration guide (docs/migration/api-changes-v012.md)
- Add API improvements roadmap (docs/roadmap/api-improvements-v02.md)
- Update API reference documentation

BREAKING CHANGE: Replaced LlamaIndex text splitter with semantic-text-splitter
- Add TEIProvider class for high-performance embedding inference
- Support both local and remote TEI server instances
- Add httpx as optional dependency for lightweight HTTP client
- Update factory function to support provider_type='tei'
- Add comprehensive documentation and examples
- Include unit tests with mocks for TEI functionality
- Support for authentication, retries, and health checks

TEI provides optimized inference for 100+ open-source models with:
- Flash Attention and dynamic batching
- GPU/CPU hardware acceleration
- Production-ready monitoring and metrics
- Self-hosted deployment for data privacy

Implements CFOS-46
@linear
Copy link

linear bot commented Jul 2, 2025

- Add comprehensive TEI setup guide covering hardware requirements, installation methods, and troubleshooting
- Include Docker, Docker Compose, and Kubernetes deployment examples
- Add security considerations and performance tuning tips
- Document NumPy 2.x compatibility issues with PyArrow
- Link from main embedding providers doc to setup guide
- Upgrade PyArrow from 14.0.2 to >=17.0.0 for better NumPy compatibility
- Pin NumPy to 1.x series (numpy>=1.24,<2) to avoid NumPy 2.x issues
- Resolves 'numpy.core.multiarray failed to import' errors
- Fixes test environment and development workflows
…embeddings-inference-tei-support-to-embeddings
@jayscambler jayscambler merged commit e3fc1bf into main Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant