Pluggable Providers Research

Research: Pluggable Model Providers

Feature: 103-pluggable-providers Date: 2026-02-01 Status: Complete

Research Tasks

1. Current Architecture Analysis

Question: How are embeddings and summarization currently implemented?

Findings:

The current implementation in agent-brain-server/agent_brain_server/indexing/embedding.py uses:

Embeddings: Hard-coded to OpenAI's AsyncOpenAI client
- Initialized in EmbeddingGenerator.__init__() (line 43-45)
- Uses settings.OPENAI_API_KEY and settings.EMBEDDING_MODEL
- Model dimensions hard-coded in dictionary (line 170-175)
Summarization: Hard-coded to Anthropic's AsyncAnthropic client
- Initialized in same class (line 48-50)
- Uses settings.ANTHROPIC_API_KEY and settings.CLAUDE_MODEL
- Fixed prompt template for code summarization
Singleton Pattern: Global _embedding_generator with get_embedding_generator()
- Makes provider swapping difficult at runtime
- No abstract interface for mocking in tests

Decision: Refactor to Protocol-based design with Factory pattern. Rationale: Enables runtime provider selection, improves testability, follows existing DI patterns in services. Alternatives Rejected:

ABC base class: More verbose, requires inheritance
Direct if/else switching: Would require changes throughout codebase

2. LlamaIndex Provider Integration

Question: What LlamaIndex abstractions exist for our target providers?

Findings:

LlamaIndex provides embedding integrations via separate packages:

Provider	Package	Async Support	Notes
OpenAI	`llama-index-embeddings-openai`	✅ Yes	Already installed
Ollama	`llama-index-embeddings-ollama`	✅ Yes	OpenAI-compatible API
Cohere	`llama-index-embeddings-cohere`	✅ Yes	Requires `cohere` SDK
Bedrock	`llama-index-embeddings-bedrock`	✅ Yes	Future: Phase 7

LlamaIndex LLM integrations for summarization:

Provider	Package	Async Support	Notes
Anthropic	`llama-index-llms-anthropic`	✅ Yes	Available but not used currently
OpenAI	`llama-index-llms-openai`	✅ Yes	Already installed
Gemini	`llama-index-llms-gemini`	⚠️ Partial	Requires wrapper
Ollama	`llama-index-llms-ollama`	✅ Yes	OpenAI-compatible

Decision: Use native SDK clients instead of LlamaIndex LLM wrappers. Rationale:

Direct control over prompts and parameters
Simpler error handling
Fewer dependencies
Current implementation already uses native clients Alternatives Rejected: LlamaIndex LLM wrappers (less control, extra abstraction layer)

3. Provider Authentication Patterns

Question: How do each provider handle authentication?

Findings:

Provider	Auth Method	Environment Variable	SDK Support
OpenAI	API Key	`OPENAI_API_KEY`	Native
Anthropic	API Key	`ANTHROPIC_API_KEY`	Native
Cohere	API Key	`COHERE_API_KEY`	Native
Gemini	API Key	`GOOGLE_API_KEY`	Native
Grok	API Key	`GROK_API_KEY`	OpenAI-compatible
Ollama	None	N/A	Local HTTP

Decision: Use api_key_env field in config to reference environment variable names. Rationale:

Keeps secrets out of config files
Follows existing pattern in settings.py
Supports different keys for different providers Alternatives Rejected:
Direct API keys in config (security risk)
Single environment variable (inflexible)

4. Embedding Dimension Handling

Question: How to handle different embedding dimensions across providers?

Findings:

Provider	Model	Dimensions
OpenAI	text-embedding-3-large	3072
OpenAI	text-embedding-3-small	1536
OpenAI	text-embedding-ada-002	1536
Ollama	nomic-embed-text	768
Ollama	mxbai-embed-large	1024
Cohere	embed-english-v3	1024
Cohere	embed-multilingual-v3	1024

Critical Issue: ChromaDB collections are created with fixed dimensions. Switching providers with different dimensions requires re-indexing.

Decision:

Store embedding provider/model in index metadata
Validate on startup: if provider/model changed, require explicit re-index
Provide get_dimensions() method on EmbeddingProvider protocol

Rationale: Prevents silent index corruption. Alternatives Rejected: Auto-reindex (too slow, could be destructive)

5. Ollama Integration Patterns

Question: Best practices for Ollama integration?

Findings:

Ollama provides an OpenAI-compatible API:

Base URL: http://localhost:11434/v1
Embedding endpoint: /embeddings
Chat endpoint: /chat/completions

Two integration approaches:

Native Ollama client (ollama package):

import ollama
response = await ollama.embeddings(model="nomic-embed-text", prompt="text")

OpenAI client with custom base_url:

client = AsyncOpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = await client.embeddings.create(model="nomic-embed-text", input="text")

Decision: Use OpenAI client with custom base_url. Rationale:

Reuse existing OpenAI provider code
Consistent async patterns
Fewer dependencies Alternatives Rejected: Native ollama package (different API, new dependency)

6. Grok Integration

Question: How to integrate Grok (xAI)?

Findings:

Grok uses an OpenAI-compatible API:

Base URL: https://api.x.ai/v1
API Key: Required via x-api-key header
Models: grok-4, grok-4-fast

Decision: Reuse OpenAI provider with custom base_url and header. Rationale: Minimal code, proven pattern. Alternatives Rejected: Separate implementation (unnecessary duplication)

7. Gemini Integration

Question: How to integrate Google Gemini?

Findings:

Gemini SDK (google-generativeai):

import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel("gemini-3-flash")
response = await model.generate_content_async("prompt")

Key considerations:

Async support via generate_content_async()
No embedding API (Gemini doesn't provide embeddings)
Only usable for summarization, not embeddings

Decision: Gemini for summarization only; use Vertex AI (Phase 8) for embeddings. Rationale: Gemini API doesn't expose embeddings. Alternatives Rejected: None (technical limitation)

8. Configuration File Format

Question: What format should config.yaml use?

Findings:

Proposed structure:

# config.yaml
embedding:
  provider: openai  # openai | ollama | cohere
  model: text-embedding-3-large
  api_key_env: OPENAI_API_KEY
  params:
    batch_size: 100

summarization:
  provider: anthropic  # anthropic | openai | gemini | grok | ollama
  model: claude-haiku-4-5-20251001
  api_key_env: ANTHROPIC_API_KEY
  params:
    max_tokens: 300
    temperature: 0.1

Config loading precedence:

Environment variables (highest)
config.yaml in project root
Default values (lowest)

Decision: YAML with nested provider config. Rationale: Human-readable, supports comments, familiar to developers. Alternatives Rejected:

JSON (no comments, harder to read)
TOML (less familiar)
Flat environment variables (complex nesting)

9. Error Handling Strategy

Question: How to handle provider-specific errors?

Findings:

Common error scenarios:

API key missing/invalid → Startup failure with clear message
Model not found → Startup failure with available models
Rate limit → Retry with exponential backoff
Network timeout → Retry, then fail with context
Ollama not running → Clear error with troubleshooting steps

Decision: Wrap provider-specific exceptions in common ProviderError hierarchy. Rationale: Consistent error handling across providers. Alternatives Rejected: Pass-through exceptions (inconsistent, hard to handle)

10. Testing Strategy

Question: How to test providers without hitting real APIs?

Findings:

Testing approaches:

Unit tests: Mock SDK clients at method level
Integration tests: Use pytest fixtures with mock servers
Contract tests: Record/replay with VCR.py or similar
Ollama tests: Use local Ollama instance in CI (optional)

Decision: Mock SDK clients for unit tests; optional integration tests with real providers. Rationale: Fast CI, optional manual validation. Alternatives Rejected:

VCR.py (complex setup, cassette management)
Always-real API calls (slow, expensive, flaky)

Summary of Key Decisions

Area	Decision	Impact
Architecture	Protocol + Factory pattern	Extensible, testable
Config format	YAML with nested providers	Human-readable
LlamaIndex	Native SDKs, not LLM wrappers	Direct control
Ollama	OpenAI-compatible client	Code reuse
Grok	OpenAI-compatible client	Code reuse
Gemini	Summarization only	API limitation
Dimensions	Validate on startup	Prevent corruption
Auth	api_key_env references	Secure
Errors	Common ProviderError	Consistent
Testing	Mock SDKs	Fast CI

Open Questions Resolved

✅ How to handle dimension mismatches? → Validate on startup, require re-index
✅ Should we use LlamaIndex wrappers? → No, use native SDKs
✅ How to support Ollama offline? → OpenAI-compatible client with local URL
✅ Can Gemini do embeddings? → No, summarization only
✅ How to test without API keys? → Mock SDK clients

Pluggable Providers Research

Research: Pluggable Model Providers

Research Tasks

1. Current Architecture Analysis

2. LlamaIndex Provider Integration

3. Provider Authentication Patterns

4. Embedding Dimension Handling

5. Ollama Integration Patterns

6. Grok Integration

7. Gemini Integration

8. Configuration File Format

9. Error Handling Strategy

10. Testing Strategy

Summary of Key Decisions

Open Questions Resolved

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!