-
Notifications
You must be signed in to change notification settings - Fork 16
Pluggable Providers Research
Feature: 103-pluggable-providers Date: 2026-02-01 Status: Complete
Question: How are embeddings and summarization currently implemented?
Findings:
The current implementation in agent-brain-server/agent_brain_server/indexing/embedding.py uses:
-
Embeddings: Hard-coded to OpenAI's
AsyncOpenAIclient- Initialized in
EmbeddingGenerator.__init__()(line 43-45) - Uses
settings.OPENAI_API_KEYandsettings.EMBEDDING_MODEL - Model dimensions hard-coded in dictionary (line 170-175)
- Initialized in
-
Summarization: Hard-coded to Anthropic's
AsyncAnthropicclient- Initialized in same class (line 48-50)
- Uses
settings.ANTHROPIC_API_KEYandsettings.CLAUDE_MODEL - Fixed prompt template for code summarization
-
Singleton Pattern: Global
_embedding_generatorwithget_embedding_generator()- Makes provider swapping difficult at runtime
- No abstract interface for mocking in tests
Decision: Refactor to Protocol-based design with Factory pattern. Rationale: Enables runtime provider selection, improves testability, follows existing DI patterns in services. Alternatives Rejected:
- ABC base class: More verbose, requires inheritance
- Direct if/else switching: Would require changes throughout codebase
Question: What LlamaIndex abstractions exist for our target providers?
Findings:
LlamaIndex provides embedding integrations via separate packages:
| Provider | Package | Async Support | Notes |
|---|---|---|---|
| OpenAI | llama-index-embeddings-openai |
✅ Yes | Already installed |
| Ollama | llama-index-embeddings-ollama |
✅ Yes | OpenAI-compatible API |
| Cohere | llama-index-embeddings-cohere |
✅ Yes | Requires cohere SDK |
| Bedrock | llama-index-embeddings-bedrock |
✅ Yes | Future: Phase 7 |
LlamaIndex LLM integrations for summarization:
| Provider | Package | Async Support | Notes |
|---|---|---|---|
| Anthropic | llama-index-llms-anthropic |
✅ Yes | Available but not used currently |
| OpenAI | llama-index-llms-openai |
✅ Yes | Already installed |
| Gemini | llama-index-llms-gemini |
Requires wrapper | |
| Ollama | llama-index-llms-ollama |
✅ Yes | OpenAI-compatible |
Decision: Use native SDK clients instead of LlamaIndex LLM wrappers. Rationale:
- Direct control over prompts and parameters
- Simpler error handling
- Fewer dependencies
- Current implementation already uses native clients Alternatives Rejected: LlamaIndex LLM wrappers (less control, extra abstraction layer)
Question: How do each provider handle authentication?
Findings:
| Provider | Auth Method | Environment Variable | SDK Support |
|---|---|---|---|
| OpenAI | API Key | OPENAI_API_KEY |
Native |
| Anthropic | API Key | ANTHROPIC_API_KEY |
Native |
| Cohere | API Key | COHERE_API_KEY |
Native |
| Gemini | API Key | GOOGLE_API_KEY |
Native |
| Grok | API Key | GROK_API_KEY |
OpenAI-compatible |
| Ollama | None | N/A | Local HTTP |
Decision: Use api_key_env field in config to reference environment variable names.
Rationale:
- Keeps secrets out of config files
- Follows existing pattern in settings.py
- Supports different keys for different providers Alternatives Rejected:
- Direct API keys in config (security risk)
- Single environment variable (inflexible)
Question: How to handle different embedding dimensions across providers?
Findings:
| Provider | Model | Dimensions |
|---|---|---|
| OpenAI | text-embedding-3-large | 3072 |
| OpenAI | text-embedding-3-small | 1536 |
| OpenAI | text-embedding-ada-002 | 1536 |
| Ollama | nomic-embed-text | 768 |
| Ollama | mxbai-embed-large | 1024 |
| Cohere | embed-english-v3 | 1024 |
| Cohere | embed-multilingual-v3 | 1024 |
Critical Issue: ChromaDB collections are created with fixed dimensions. Switching providers with different dimensions requires re-indexing.
Decision:
- Store embedding provider/model in index metadata
- Validate on startup: if provider/model changed, require explicit re-index
- Provide
get_dimensions()method on EmbeddingProvider protocol
Rationale: Prevents silent index corruption. Alternatives Rejected: Auto-reindex (too slow, could be destructive)
Question: Best practices for Ollama integration?
Findings:
Ollama provides an OpenAI-compatible API:
- Base URL:
http://localhost:11434/v1 - Embedding endpoint:
/embeddings - Chat endpoint:
/chat/completions
Two integration approaches:
-
Native Ollama client (
ollamapackage):import ollama response = await ollama.embeddings(model="nomic-embed-text", prompt="text")
-
OpenAI client with custom base_url:
client = AsyncOpenAI(base_url="http://localhost:11434/v1", api_key="ollama") response = await client.embeddings.create(model="nomic-embed-text", input="text")
Decision: Use OpenAI client with custom base_url. Rationale:
- Reuse existing OpenAI provider code
- Consistent async patterns
- Fewer dependencies Alternatives Rejected: Native ollama package (different API, new dependency)
Question: How to integrate Grok (xAI)?
Findings:
Grok uses an OpenAI-compatible API:
- Base URL:
https://api.x.ai/v1 - API Key: Required via
x-api-keyheader - Models:
grok-4,grok-4-fast
Decision: Reuse OpenAI provider with custom base_url and header. Rationale: Minimal code, proven pattern. Alternatives Rejected: Separate implementation (unnecessary duplication)
Question: How to integrate Google Gemini?
Findings:
Gemini SDK (google-generativeai):
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel("gemini-3-flash")
response = await model.generate_content_async("prompt")Key considerations:
- Async support via
generate_content_async() - No embedding API (Gemini doesn't provide embeddings)
- Only usable for summarization, not embeddings
Decision: Gemini for summarization only; use Vertex AI (Phase 8) for embeddings. Rationale: Gemini API doesn't expose embeddings. Alternatives Rejected: None (technical limitation)
Question: What format should config.yaml use?
Findings:
Proposed structure:
# config.yaml
embedding:
provider: openai # openai | ollama | cohere
model: text-embedding-3-large
api_key_env: OPENAI_API_KEY
params:
batch_size: 100
summarization:
provider: anthropic # anthropic | openai | gemini | grok | ollama
model: claude-haiku-4-5-20251001
api_key_env: ANTHROPIC_API_KEY
params:
max_tokens: 300
temperature: 0.1Config loading precedence:
- Environment variables (highest)
- config.yaml in project root
- Default values (lowest)
Decision: YAML with nested provider config. Rationale: Human-readable, supports comments, familiar to developers. Alternatives Rejected:
- JSON (no comments, harder to read)
- TOML (less familiar)
- Flat environment variables (complex nesting)
Question: How to handle provider-specific errors?
Findings:
Common error scenarios:
- API key missing/invalid → Startup failure with clear message
- Model not found → Startup failure with available models
- Rate limit → Retry with exponential backoff
- Network timeout → Retry, then fail with context
- Ollama not running → Clear error with troubleshooting steps
Decision: Wrap provider-specific exceptions in common ProviderError hierarchy.
Rationale: Consistent error handling across providers.
Alternatives Rejected: Pass-through exceptions (inconsistent, hard to handle)
Question: How to test providers without hitting real APIs?
Findings:
Testing approaches:
- Unit tests: Mock SDK clients at method level
- Integration tests: Use pytest fixtures with mock servers
- Contract tests: Record/replay with VCR.py or similar
- Ollama tests: Use local Ollama instance in CI (optional)
Decision: Mock SDK clients for unit tests; optional integration tests with real providers. Rationale: Fast CI, optional manual validation. Alternatives Rejected:
- VCR.py (complex setup, cassette management)
- Always-real API calls (slow, expensive, flaky)
| Area | Decision | Impact |
|---|---|---|
| Architecture | Protocol + Factory pattern | Extensible, testable |
| Config format | YAML with nested providers | Human-readable |
| LlamaIndex | Native SDKs, not LLM wrappers | Direct control |
| Ollama | OpenAI-compatible client | Code reuse |
| Grok | OpenAI-compatible client | Code reuse |
| Gemini | Summarization only | API limitation |
| Dimensions | Validate on startup | Prevent corruption |
| Auth | api_key_env references | Secure |
| Errors | Common ProviderError | Consistent |
| Testing | Mock SDKs | Fast CI |
- ✅ How to handle dimension mismatches? → Validate on startup, require re-index
- ✅ Should we use LlamaIndex wrappers? → No, use native SDKs
- ✅ How to support Ollama offline? → OpenAI-compatible client with local URL
- ✅ Can Gemini do embeddings? → No, summarization only
- ✅ How to test without API keys? → Mock SDK clients
- Design-Architecture-Overview
- Design-Query-Architecture
- Design-Storage-Architecture
- Design-Class-Diagrams
- GraphRAG-Guide
- Agent-Skill-Hybrid-Search-Guide
- Agent-Skill-Graph-Search-Guide
- Agent-Skill-Vector-Search-Guide
- Agent-Skill-BM25-Search-Guide
Search
Server
Setup
- Pluggable-Providers-Spec
- GraphRAG-Integration-Spec
- Agent-Brain-Plugin-Spec
- Multi-Instance-Architecture-Spec