-
Notifications
You must be signed in to change notification settings - Fork 6
Add local embedding provider with FastEmbed for offline semantic search #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add local embedding provider with FastEmbed for offline semantic search #3
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR introduces a multi-backend embedding provider architecture supporting OpenAI (remote API), FastEmbed (local ONNX), and Placeholder (hash-based) backends. Configuration via EMBEDDING_PROVIDER environment variable defaults to auto-selection with OpenAI → FastEmbed → Placeholder fallback order. All embedding operations now route through the provider abstraction. Changes
Sequence Diagram(s)sequenceDiagram
participant App as app.py<br/>(startup)
participant Init as init_embedding_provider()
participant Selector as Provider Selector
participant OAI as OpenAIEmbeddingProvider
participant FastE as FastEmbedProvider
participant PH as PlaceholderEmbeddingProvider
App->>Init: Call during startup
Init->>Selector: Check EMBEDDING_PROVIDER env
alt auto mode (default)
Selector->>OAI: Try OpenAI (requires API key)
alt OPENAI_API_KEY set
OAI-->>Selector: ✓ Ready
else key missing
Selector->>FastE: Try FastEmbed (local)
alt fastembed available
FastE-->>Selector: ✓ Ready
else not available
Selector->>PH: Use Placeholder
PH-->>Selector: ✓ Ready
end
end
else explicit mode
alt "openai"
Selector->>OAI: Initialize
else "local"
Selector->>FastE: Initialize
else "placeholder"
Selector->>PH: Initialize
end
end
Selector-->>Init: Return provider instance
Init-->>App: Store in ServiceState
sequenceDiagram
participant Client as API Client
participant Handler as Endpoint Handler
participant Provider as Active Provider<br/>(OpenAI/FastEmbed/...)
participant Fallback as PlaceholderProvider
Client->>Handler: Request with embedding
Handler->>Provider: generate_embedding(text)
alt Success
Provider-->>Handler: [0.2, 0.5, ...]
else Failure
Provider--xHandler: Exception
Handler->>Fallback: generate_embedding(text)
Fallback-->>Handler: [hash-based vector]
end
Handler-->>Client: Response (with embedding)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes involve multiple interdependent components: a new abstract interface, three provider implementations with varying complexity (OpenAI API integration, local model caching/initialization, deterministic hashing), significant refactoring of embedding paths in app.py (provider selection, error handling, fallback logic), configuration management, and persistent storage setup. While individual provider implementations follow a consistent pattern, the heterogeneity of the implementations and the app.py integration logic require careful review of each component's correctness and interaction. Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (11)
CLAUDE.md (1)
133-138: Document runtime prerequisites and cache dir override
- Add a note to pin onnxruntime to <1.20 when using fastembed==0.4.2 to avoid runtime issues. Based on learnings.
- Consider documenting AUTOMEM_MODELS_DIR to override the default cache path for local models (useful when container user/home differs from /root). This keeps docker-compose volume paths and runtime cache in sync.
Also applies to: 188-191
automem/embedding/openai.py (1)
20-25: Add timeout/retry configuration to the OpenAI clientDefault 10m timeout is too long for request threads; also make retries explicit. Expose via constructor args with sane defaults.
def __init__( self, api_key: str, model: str = "text-embedding-3-small", dimension: int = 768, + timeout: float = 30.0, + max_retries: int = 2, ): @@ - self.client = OpenAI(api_key=api_key) + self.client = OpenAI(api_key=api_key, timeout=timeout, max_retries=max_retries)Also applies to: 36-43
automem/embedding/fastembed.py (2)
101-117: Optional: add length checks on outputsMirror OpenAI checks to catch unexpected shape changes early; helps debug integration issues with Qdrant.
- embeddings = list(self.model.embed([text])) - embedding = embeddings[0].tolist() + embeddings = list(self.model.embed([text])) + embedding = embeddings[0].tolist() + if len(embedding) != self._dimension: + raise ValueError( + f"fastembed embedding length {len(embedding)} != configured dimension {self._dimension} " + f"(model={self.model_name})" + ) @@ - embeddings = [emb.tolist() for emb in self.model.embed(texts)] + embeddings = [emb.tolist() for emb in self.model.embed(texts)] + bad = next((i for i, e in enumerate(embeddings) if len(e) != self._dimension), None) + if bad is not None: + raise ValueError( + f"fastembed batch embedding length {len(embeddings[bad])} != configured dimension {self._dimension} " + f"at index {bad} (model={self.model_name})" + )Also applies to: 118-140
101-109: Quality tweak (optional): BGE models work best with ‘passage:’/‘query:’ prefixesConsider optionally prefixing stored documents with "passage: " (and queries with "query: ") to align with BGE training. Keep disabled by default for parity across providers.
requirements.txt (1)
10-10: Pin onnxruntime to a compatible range for fastembed 0.4.2fastembed 0.4.2 recommends onnxruntime <1.20 to avoid runtime issues. Add an explicit pin to prevent breakages in fresh environments. Based on learnings.
fastembed==0.4.2 +onnxruntime<1.20docker-compose.yml (1)
36-47: Make model cache path robust and configurable; set provider in dev
- The mount uses /root/.config/automem/models, which assumes the container runs as root. If the user changes, Path.home() differs and cache won’t persist. Either force the user to root or expose the cache dir via env and read it in code.
- Also consider setting EMBEDDING_PROVIDER for local/offline dev.
flask-api: build: . @@ environment: FLASK_ENV: development FLASK_DEBUG: "1" PORT: 8001 + EMBEDDING_PROVIDER: ${EMBEDDING_PROVIDER:-auto} # set to 'local' for offline dev + AUTOMEM_MODELS_DIR: /root/.config/automem/models # keep in sync with volume path @@ - volumes: + volumes: - .:/app - - fastembed_models:/root/.config/automem/models # Persist embedding models + - fastembed_models:/root/.config/automem/models # Persist embedding models + # If you switch to a non-root user, update AUTOMEM_MODELS_DIR and mount path accordingly @@ volumes: falkordb_data: qdrant_data: fastembed_models: # Persistent cache for local embedding modelsAlternatively, explicitly set the container user to root to match the mounted path:
flask-api: build: . + user: "0" # ensure Path.home() == /root (dev only)Also applies to: 55-56, 73-73
docs/ENVIRONMENT_VARIABLES.md (1)
50-70: Document operational caveats for the local FastEmbed backend.Add notes so offline users don’t get tripped up:
- Pin onnxruntime to <1.20 with fastembed 0.4.x to avoid runtime issues.
- Persist the model cache in Docker (mount ~/.config/automem/models).
- Ensure VECTOR_SIZE matches the model’s dimensions (e.g., 768 for bge-base-en-v1.5).
Apply this doc tweak:
@@ **Local Model Details:** - Model: `BAAI/bge-base-en-v1.5` (768 dimensions) - Size: ~210MB (cached to `~/.config/automem/models/`) - No API key or internet required after first download - Good semantic quality, faster than API calls + - Recommended: pin `onnxruntime<1.20` with `fastembed` 0.4.x to avoid runtime issues + - Docker: persist model cache with a volume, e.g.: + - `~/.config/automem/models` inside the container + - docker-compose example: `fastembed_models:/root/.config/automem/models` + - Ensure `VECTOR_SIZE` equals the selected model’s dimension (default 768)automem/embedding/placeholder.py (1)
36-39: Non-crypto RNG is fine here; silence the S311 linter.This is intentionally non-cryptographic. Add a nosec to avoid noise.
- rng = random.Random(seed) + rng = random.Random(seed) # nosec B311 - deterministic, non-crypto usageapp.py (3)
3647-3660: Guard against provider dimension mismatches; fail soft to placeholder.If a provider returns wrong dims, Qdrant upsert will fail. Add a defensive check.
def _generate_real_embedding(content: str) -> List[float]: @@ - try: - embedding = state.embedding_provider.generate_embedding(content) - return embedding + try: + embedding = state.embedding_provider.generate_embedding(content) + if not isinstance(embedding, list) or len(embedding) != VECTOR_SIZE: + logger.warning( + "Provider %s returned %s dims (expected %d); falling back to placeholder", + state.embedding_provider.provider_name(), + len(embedding) if isinstance(embedding, list) else "invalid", + VECTOR_SIZE, + ) + return _generate_placeholder_embedding(content) + return embedding
3663-3679: Batch: validate embedding shapes; fallback cleanly.Mirror the single-embedding guard for batches.
def _generate_real_embeddings_batch(contents: List[str]) -> List[List[float]]: @@ - try: - embeddings = state.embedding_provider.generate_embeddings_batch(contents) - return embeddings + try: + embeddings = state.embedding_provider.generate_embeddings_batch(contents) + if not embeddings or any(len(e) != VECTOR_SIZE for e in embeddings): + logger.warning( + "Provider %s returned invalid dims in batch; using placeholders", + state.embedding_provider.provider_name() if state.embedding_provider else "unknown", + ) + return [_generate_placeholder_embedding(c) for c in contents] + return embeddings
2421-2429: Update docstring to reflect provider-based reembedding.Route no longer requires OpenAI specifically; it uses the configured provider.
-"""Regenerate embeddings for existing memories using OpenAI API. - -Requires admin token and OpenAI API key configured. +"""Regenerate embeddings for existing memories using the configured provider. + +Requires admin token and an available embedding provider (OpenAI, FastEmbed, or Placeholder).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CLAUDE.md(2 hunks)app.py(6 hunks)automem/embedding/__init__.py(1 hunks)automem/embedding/fastembed.py(1 hunks)automem/embedding/openai.py(1 hunks)automem/embedding/placeholder.py(1 hunks)automem/embedding/provider.py(1 hunks)docker-compose.yml(2 hunks)docs/ENVIRONMENT_VARIABLES.md(1 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
automem/embedding/placeholder.py (3)
automem/embedding/provider.py (5)
EmbeddingProvider(7-63)dimension(45-51)generate_embedding(15-27)generate_embeddings_batch(30-42)provider_name(54-60)automem/embedding/fastembed.py (4)
dimension(141-147)generate_embedding(101-116)generate_embeddings_batch(118-139)provider_name(149-155)automem/embedding/openai.py (4)
dimension(90-96)generate_embedding(45-62)generate_embeddings_batch(64-88)provider_name(98-104)
automem/embedding/__init__.py (4)
automem/embedding/provider.py (1)
EmbeddingProvider(7-63)automem/embedding/openai.py (1)
OpenAIEmbeddingProvider(13-104)automem/embedding/fastembed.py (1)
FastEmbedProvider(22-155)automem/embedding/placeholder.py (1)
PlaceholderEmbeddingProvider(10-66)
automem/embedding/provider.py (3)
automem/embedding/fastembed.py (4)
generate_embedding(101-116)generate_embeddings_batch(118-139)dimension(141-147)provider_name(149-155)automem/embedding/openai.py (4)
generate_embedding(45-62)generate_embeddings_batch(64-88)dimension(90-96)provider_name(98-104)automem/embedding/placeholder.py (4)
generate_embedding(27-39)generate_embeddings_batch(41-50)dimension(52-58)provider_name(60-66)
automem/embedding/fastembed.py (2)
automem/embedding/provider.py (5)
EmbeddingProvider(7-63)dimension(45-51)generate_embedding(15-27)generate_embeddings_batch(30-42)provider_name(54-60)automem/embedding/openai.py (4)
dimension(90-96)generate_embedding(45-62)generate_embeddings_batch(64-88)provider_name(98-104)
automem/embedding/openai.py (2)
tests/conftest.py (2)
OpenAI(110-112)create(107-108)automem/embedding/provider.py (5)
EmbeddingProvider(7-63)dimension(45-51)generate_embedding(15-27)generate_embeddings_batch(30-42)provider_name(54-60)
app.py (4)
automem/embedding/provider.py (5)
EmbeddingProvider(7-63)dimension(45-51)provider_name(54-60)generate_embedding(15-27)generate_embeddings_batch(30-42)automem/embedding/openai.py (5)
OpenAIEmbeddingProvider(13-104)dimension(90-96)provider_name(98-104)generate_embedding(45-62)generate_embeddings_batch(64-88)automem/embedding/fastembed.py (5)
FastEmbedProvider(22-155)dimension(141-147)provider_name(149-155)generate_embedding(101-116)generate_embeddings_batch(118-139)automem/embedding/placeholder.py (5)
PlaceholderEmbeddingProvider(10-66)dimension(52-58)provider_name(60-66)generate_embedding(27-39)generate_embeddings_batch(41-50)
🪛 LanguageTool
docs/ENVIRONMENT_VARIABLES.md
[grammar] ~54-~54: There might be a mistake here.
Context: ...able | Description | Default | Options | |----------|-------------|---------|----...
(QB_NEW_EN)
[grammar] ~55-~55: There might be a mistake here.
Context: ...-----|-------------|---------|---------| | EMBEDDING_PROVIDER | Embedding backe...
(QB_NEW_EN)
[grammar] ~56-~56: There might be a mistake here.
Context: ...uto, openai, local, placeholder| |OPENAI_API_KEY` | OpenAI API key (for...
(QB_NEW_EN)
[grammar] ~60-~60: There might be a mistake here.
Context: ...AI → FastEmbed local model → Placeholder - openai: Use OpenAI API only (requires `OPENAI_...
(QB_NEW_EN)
[grammar] ~61-~61: There might be a mistake here.
Context: ...nAI API only (requires OPENAI_API_KEY) - local: Use FastEmbed local model only (~210MB...
(QB_NEW_EN)
[grammar] ~62-~62: There might be a mistake here.
Context: ...odel only (~210MB download on first use) - placeholder: Use hash-based embeddings (no semantic...
(QB_NEW_EN)
[grammar] ~65-~65: There might be a mistake here.
Context: ...semantic search) Local Model Details: - Model: BAAI/bge-base-en-v1.5 (768 dime...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ...BAAI/bge-base-en-v1.5 (768 dimensions) - Size: 210MB (cached to `/.config/autom...
(QB_NEW_EN)
[grammar] ~67-~67: There might be a mistake here.
Context: ... (cached to ~/.config/automem/models/) - No API key or internet required after fi...
(QB_NEW_EN)
CLAUDE.md
[grammar] ~127-~127: There might be a mistake here.
Context: ... #### Provider Priority (Auto-Selection) 1. OpenAI (`openai:text-embedding-3-small...
(QB_NEW_EN)
[grammar] ~128-~128: There might be a mistake here.
Context: ...g-3-small) - If OPENAI_API_KEY` is set - High-quality semantic embeddings via API...
(QB_NEW_EN)
[grammar] ~129-~129: There might be a mistake here.
Context: ...High-quality semantic embeddings via API - Requires network and API costs - 768 ...
(QB_NEW_EN)
[grammar] ~130-~130: There might be a mistake here.
Context: ... API - Requires network and API costs - 768 dimensions (configurable via `VECTOR...
(QB_NEW_EN)
[grammar] ~133-~133: There might be a mistake here.
Context: ...AI/bge-base-en-v1.5`) - Local ONNX model - Good quality semantic embeddings - No...
(QB_NEW_EN)
[grammar] ~134-~134: There might be a mistake here.
Context: ...el - Good quality semantic embeddings - No API key or internet required (after f...
(QB_NEW_EN)
[grammar] ~135-~135: There might be a mistake here.
Context: ...internet required (after first download) - Downloads 210MB model to `/.config/aut...
(QB_NEW_EN)
[grammar] ~136-~136: There might be a mistake here.
Context: ...~/.config/automem/models/ on first use - 768 dimensions (default), also supports ...
(QB_NEW_EN)
[grammar] ~139-~139: There might be a mistake here.
Context: ...** (placeholder) - Hash-based fallback - Deterministic vectors from content hash ...
(QB_NEW_EN)
[grammar] ~140-~140: There might be a mistake here.
Context: ... Deterministic vectors from content hash - No semantic meaning, last resort only #...
(QB_NEW_EN)
[grammar] ~146-~146: There might be a mistake here.
Context: ...t): Try OpenAI → FastEmbed → Placeholder - openai: Use OpenAI only (fail if unavailable) ...
(QB_NEW_EN)
🪛 Ruff (0.14.0)
automem/embedding/placeholder.py
38-38: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
automem/embedding/__init__.py
14-19: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
app.py
1455-1455: Avoid specifying long messages outside the exception class
(TRY003)
1463-1463: Consider moving this statement to an else block
(TRY300)
1465-1465: Avoid specifying long messages outside the exception class
(TRY003)
1471-1471: Consider moving this statement to an else block
(TRY300)
1473-1473: Avoid specifying long messages outside the exception class
(TRY003)
1492-1492: Consider moving this statement to an else block
(TRY300)
1493-1493: Do not catch blind exception: Exception
(BLE001)
1500-1500: Consider moving this statement to an else block
(TRY300)
1501-1501: Do not catch blind exception: Exception
(BLE001)
1514-1517: Avoid specifying long messages outside the exception class
(TRY003)
3656-3656: Consider moving this statement to an else block
(TRY300)
3657-3657: Do not catch blind exception: Exception
(BLE001)
3675-3675: Consider moving this statement to an else block
(TRY300)
3676-3676: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (1)
automem/embedding/provider.py (1)
1-63: LGTM – clear ABC and API surface.Interface is concise and consistent with backends.
app.py
Outdated
| # Import embedding providers | ||
| from automem.embedding import ( | ||
| EmbeddingProvider, | ||
| OpenAIEmbeddingProvider, | ||
| FastEmbedProvider, | ||
| PlaceholderEmbeddingProvider, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: import-time coupling to optional providers breaks “offline/no OpenAI” runs.
from automem.embedding import (...) re-exports import OpenAI/FastEmbed modules eagerly, requiring those packages even when unused. Combined with the top-level from openai import OpenAI (Line 35), this defeats the goal of running without OpenAI installed.
Fix by:
- Importing only the interface at module import.
- Doing branch-local lazy imports inside init_embedding_provider().
- Guarding the top-level OpenAI import similar to spaCy.
Apply this diff for the provider imports:
-# Import embedding providers
-from automem.embedding import (
- EmbeddingProvider,
- OpenAIEmbeddingProvider,
- FastEmbedProvider,
- PlaceholderEmbeddingProvider,
-)
+# Import only the interface; import backends lazily in init_embedding_provider()
+from automem.embedding.provider import EmbeddingProviderAdditionally, make the OpenAI client import optional (outside this hunk):
# near existing `from openai import OpenAI`
try:
from openai import OpenAI # type: ignore
except Exception: # ImportError on machines without openai
OpenAI = None # type: ignoreAnd in init_openai(), early-return if OpenAI is None. Based on learnings
🤖 Prompt for AI Agents
In app.py around lines 38 to 44, the current top-level imports re-export
concrete embedding providers (OpenAIEmbeddingProvider, FastEmbedProvider,
PlaceholderEmbeddingProvider) which forces optional packages to be installed and
couples import-time behavior; change the file to import only the
EmbeddingProvider interface at module import and move concrete provider imports
into init_embedding_provider() as lazy, branch-local imports; also make the
top-level OpenAI client import optional by wrapping it in a try/except that sets
OpenAI = None, and modify init_openai() to early-return when OpenAI is None so
runs can proceed without OpenAI installed.
| def init_embedding_provider() -> None: | ||
| """Initialize embedding provider with auto-selection fallback. | ||
| Priority order: | ||
| 1. OpenAI API (if OPENAI_API_KEY is set) | ||
| 2. Local fastembed model (no API key needed) | ||
| 3. Placeholder hash-based embeddings (fallback) | ||
| Can be controlled via EMBEDDING_PROVIDER env var: | ||
| - "auto" (default): Try OpenAI, then fastembed, then placeholder | ||
| - "openai": Use OpenAI only, fail if unavailable | ||
| - "local": Use fastembed only, fail if unavailable | ||
| - "placeholder": Use placeholder embeddings | ||
| """ | ||
| if state.embedding_provider is not None: | ||
| return | ||
|
|
||
| provider_config = os.getenv("EMBEDDING_PROVIDER", "auto").lower() | ||
| vector_size = VECTOR_SIZE | ||
|
|
||
| # Explicit provider selection | ||
| if provider_config == "openai": | ||
| api_key = os.getenv("OPENAI_API_KEY") | ||
| if not api_key: | ||
| raise RuntimeError("EMBEDDING_PROVIDER=openai but OPENAI_API_KEY not set") | ||
| try: | ||
| state.embedding_provider = OpenAIEmbeddingProvider( | ||
| api_key=api_key, | ||
| model="text-embedding-3-small", | ||
| dimension=vector_size | ||
| ) | ||
| logger.info("Embedding provider: %s", state.embedding_provider.provider_name()) | ||
| return | ||
| except Exception as e: | ||
| raise RuntimeError(f"Failed to initialize OpenAI provider: {e}") from e | ||
|
|
||
| elif provider_config == "local": | ||
| try: | ||
| state.embedding_provider = FastEmbedProvider(dimension=vector_size) | ||
| logger.info("Embedding provider: %s", state.embedding_provider.provider_name()) | ||
| return | ||
| except Exception as e: | ||
| raise RuntimeError(f"Failed to initialize local fastembed provider: {e}") from e | ||
|
|
||
| elif provider_config == "placeholder": | ||
| state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size) | ||
| logger.info("Embedding provider: %s", state.embedding_provider.provider_name()) | ||
| return | ||
|
|
||
| # Auto-selection: Try OpenAI → fastembed → placeholder | ||
| if provider_config == "auto": | ||
| # Try OpenAI first | ||
| api_key = os.getenv("OPENAI_API_KEY") | ||
| if api_key: | ||
| try: | ||
| state.embedding_provider = OpenAIEmbeddingProvider( | ||
| api_key=api_key, | ||
| model="text-embedding-3-small", | ||
| dimension=vector_size | ||
| ) | ||
| logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name()) | ||
| return | ||
| except Exception as e: | ||
| logger.warning("Failed to initialize OpenAI provider, trying local model: %s", str(e)) | ||
|
|
||
| # Try local fastembed | ||
| try: | ||
| state.embedding_provider = FastEmbedProvider(dimension=vector_size) | ||
| logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name()) | ||
| return | ||
| except Exception as e: | ||
| logger.warning("Failed to initialize fastembed provider, using placeholder: %s", str(e)) | ||
|
|
||
| # Fallback to placeholder | ||
| state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size) | ||
| logger.warning( | ||
| "Using placeholder embeddings (no semantic search). " | ||
| "Install fastembed or set OPENAI_API_KEY for semantic embeddings." | ||
| ) | ||
| logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name()) | ||
| return | ||
|
|
||
| # Invalid config | ||
| raise ValueError( | ||
| f"Invalid EMBEDDING_PROVIDER={provider_config}. " | ||
| f"Valid options: auto, openai, local, placeholder" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Make backends lazily imported per-branch and harden config parsing.
Avoid importing optional deps unless selected; minor robustness tweaks.
def init_embedding_provider() -> None:
@@
- provider_config = os.getenv("EMBEDDING_PROVIDER", "auto").lower()
+ provider_config = (os.getenv("EMBEDDING_PROVIDER", "auto") or "auto").strip().lower()
@@
if provider_config == "openai":
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise RuntimeError("EMBEDDING_PROVIDER=openai but OPENAI_API_KEY not set")
try:
- state.embedding_provider = OpenAIEmbeddingProvider(
+ from automem.embedding.openai import OpenAIEmbeddingProvider
+ state.embedding_provider = OpenAIEmbeddingProvider(
api_key=api_key,
model="text-embedding-3-small",
dimension=vector_size
)
@@
elif provider_config == "local":
try:
- state.embedding_provider = FastEmbedProvider(dimension=vector_size)
+ from automem.embedding.fastembed import FastEmbedProvider
+ state.embedding_provider = FastEmbedProvider(dimension=vector_size)
logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
return
except Exception as e:
raise RuntimeError(f"Failed to initialize local fastembed provider: {e}") from e
@@
elif provider_config == "placeholder":
- state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
+ from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+ state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
return
@@
if provider_config == "auto":
# Try OpenAI first
api_key = os.getenv("OPENAI_API_KEY")
if api_key:
try:
- state.embedding_provider = OpenAIEmbeddingProvider(
+ from automem.embedding.openai import OpenAIEmbeddingProvider
+ state.embedding_provider = OpenAIEmbeddingProvider(
api_key=api_key,
model="text-embedding-3-small",
dimension=vector_size
)
logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
return
except Exception as e:
logger.warning("Failed to initialize OpenAI provider, trying local model: %s", str(e))
# Try local fastembed
try:
- state.embedding_provider = FastEmbedProvider(dimension=vector_size)
+ from automem.embedding.fastembed import FastEmbedProvider
+ state.embedding_provider = FastEmbedProvider(dimension=vector_size)
logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
return
except Exception as e:
logger.warning("Failed to initialize fastembed provider, using placeholder: %s", str(e))
# Fallback to placeholder
- state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
+ from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+ state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)🧰 Tools
🪛 Ruff (0.14.0)
1455-1455: Avoid specifying long messages outside the exception class
(TRY003)
1463-1463: Consider moving this statement to an else block
(TRY300)
1465-1465: Avoid specifying long messages outside the exception class
(TRY003)
1471-1471: Consider moving this statement to an else block
(TRY300)
1473-1473: Avoid specifying long messages outside the exception class
(TRY003)
1492-1492: Consider moving this statement to an else block
(TRY300)
1493-1493: Do not catch blind exception: Exception
(BLE001)
1500-1500: Consider moving this statement to an else block
(TRY300)
1501-1501: Do not catch blind exception: Exception
(BLE001)
1514-1517: Avoid specifying long messages outside the exception class
(TRY003)
automem/embedding/__init__.py
Outdated
| from automem.embedding.provider import EmbeddingProvider | ||
| from automem.embedding.openai import OpenAIEmbeddingProvider | ||
| from automem.embedding.fastembed import FastEmbedProvider | ||
| from automem.embedding.placeholder import PlaceholderEmbeddingProvider | ||
|
|
||
| __all__ = [ | ||
| "EmbeddingProvider", | ||
| "OpenAIEmbeddingProvider", | ||
| "FastEmbedProvider", | ||
| "PlaceholderEmbeddingProvider", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid importing optional backends at module import time (breaks offline installs).
Re-exporting OpenAI/FastEmbed here forces importing their packages even when unused. This contradicts “no OpenAI required” and can crash environments without those deps. Guard or defer imports.
Apply this safer pattern:
-from automem.embedding.provider import EmbeddingProvider
-from automem.embedding.openai import OpenAIEmbeddingProvider
-from automem.embedding.fastembed import FastEmbedProvider
-from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+from .provider import EmbeddingProvider
+# Optional backends: guard imports to avoid hard dependencies at import time
+try:
+ from .openai import OpenAIEmbeddingProvider # type: ignore
+except ImportError:
+ OpenAIEmbeddingProvider = None # type: ignore[assignment]
+try:
+ from .fastembed import FastEmbedProvider # type: ignore
+except ImportError:
+ FastEmbedProvider = None # type: ignore[assignment]
+from .placeholder import PlaceholderEmbeddingProvider
@@
-__all__ = [
- "EmbeddingProvider",
- "OpenAIEmbeddingProvider",
- "FastEmbedProvider",
- "PlaceholderEmbeddingProvider",
-]
+__all__ = [
+ "EmbeddingProvider",
+ "FastEmbedProvider",
+ "OpenAIEmbeddingProvider",
+ "PlaceholderEmbeddingProvider",
+]This also satisfies Ruff’s sorted all warning. As per coding guidelines
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from automem.embedding.provider import EmbeddingProvider | |
| from automem.embedding.openai import OpenAIEmbeddingProvider | |
| from automem.embedding.fastembed import FastEmbedProvider | |
| from automem.embedding.placeholder import PlaceholderEmbeddingProvider | |
| __all__ = [ | |
| "EmbeddingProvider", | |
| "OpenAIEmbeddingProvider", | |
| "FastEmbedProvider", | |
| "PlaceholderEmbeddingProvider", | |
| ] | |
| from .provider import EmbeddingProvider | |
| # Optional backends: guard imports to avoid hard dependencies at import time | |
| try: | |
| from .openai import OpenAIEmbeddingProvider # type: ignore | |
| except ImportError: | |
| OpenAIEmbeddingProvider = None # type: ignore[assignment] | |
| try: | |
| from .fastembed import FastEmbedProvider # type: ignore | |
| except ImportError: | |
| FastEmbedProvider = None # type: ignore[assignment] | |
| from .placeholder import PlaceholderEmbeddingProvider | |
| __all__ = [ | |
| "EmbeddingProvider", | |
| "FastEmbedProvider", | |
| "OpenAIEmbeddingProvider", | |
| "PlaceholderEmbeddingProvider", | |
| ] |
🧰 Tools
🪛 Ruff (0.14.0)
14-19: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🤖 Prompt for AI Agents
In automem/embedding/__init__.py around lines 9 to 19, the module currently
imports optional backends (OpenAIEmbeddingProvider, FastEmbedProvider) at import
time which breaks offline installs; change to lazy/guarded exports by removing
top-level imports for optional providers, implement a __getattr__ that performs
local imports on first attribute access (raising AttributeError if missing), and
ensure __all__ is a sorted list of public names; this defers importing optional
dependencies until actually used and resolves the Ruff sorted-__all__ warning.
automem/embedding/fastembed.py
Outdated
| # Set cache directory | ||
| if cache_dir is None: | ||
| cache_dir = Path.home() / ".config" / "automem" / "models" | ||
| cache_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Derive actual embedding dimension and reconcile with configured value
If model_name and dimension disagree, provider.dimension() will be wrong, causing vector-store mismatches. Detect the real size after init and align (or fail fast). Also allow overriding cache_dir via env var for container/home portability.
- # Set cache directory
- if cache_dir is None:
- cache_dir = Path.home() / ".config" / "automem" / "models"
+ # Set cache directory; allow env override for portability across users/containers
+ if cache_dir is None:
+ env_dir = os.getenv("AUTOMEM_MODELS_DIR")
+ cache_dir = Path(env_dir) if env_dir else (Path.home() / ".config" / "automem" / "models")
cache_dir.mkdir(parents=True, exist_ok=True)
@@
- self.model_name = model_name
- self._dimension = dimension
+ self.model_name = model_name
+ # Derive actual embedding dimension to avoid mismatches
+ try:
+ _probe = next(self.model.embed([" "] ))
+ actual_dim = len(_probe)
+ except Exception:
+ # Fallback: assume requested dimension if probe fails (keeps behavior unchanged)
+ actual_dim = dimension
+
+ if actual_dim != dimension:
+ logger.warning(
+ "fastembed actual dimension %d != configured %d for model %s. "
+ "Using actual dimension; ensure your VECTOR_SIZE/Qdrant collection matches.",
+ actual_dim, dimension, model_name
+ )
+ self._dimension = actual_dimAdd required import at top:
-import logging
+import logging
+import osAlso applies to: 80-94
| response = self.client.embeddings.create( | ||
| input=text, model=self.model, dimensions=self._dimension | ||
| ) | ||
| embedding = response.data[0].embedding | ||
| logger.debug("Generated OpenAI embedding for text (length: %d)", len(text)) | ||
| return embedding | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Enforce returned vector length equals configured dimension
Protect against silent mismatches (e.g., model ignores dimensions or config drift) by validating the returned length matches self._dimension and failing fast with a clear error. Prevents downstream Qdrant insert errors.
response = self.client.embeddings.create(
input=text, model=self.model, dimensions=self._dimension
)
- embedding = response.data[0].embedding
+ embedding = response.data[0].embedding
+ if len(embedding) != self._dimension:
+ raise ValueError(
+ f"OpenAI embedding length {len(embedding)} != configured dimension {self._dimension} "
+ f"(model={self.model}). Ensure 'dimensions' is supported and vector store size matches."
+ )
logger.debug("Generated OpenAI embedding for text (length: %d)", len(text))
return embedding
@@
response = self.client.embeddings.create(
input=texts, model=self.model, dimensions=self._dimension
)
- embeddings = [item.embedding for item in response.data]
+ embeddings = [item.embedding for item in response.data]
+ bad = next((i for i, e in enumerate(embeddings) if len(e) != self._dimension), None)
+ if bad is not None:
+ raise ValueError(
+ f"OpenAI batch embedding length {len(embeddings[bad])} != configured dimension {self._dimension} "
+ f"at index {bad} (model={self.model})."
+ )
logger.info(Also applies to: 79-88
🤖 Prompt for AI Agents
In automem/embedding/openai.py around lines 57 to 63 (and similarly for lines
79–88), the code currently returns the embedding without checking its length;
add a validation step after extracting response.data[0].embedding that compares
len(embedding) to self._dimension and if they differ raise a ValueError (or
custom exception) with a clear message including expected and actual lengths,
and log the mismatch before raising so callers fail fast and avoid downstream
Qdrant insert errors.
Implements three-tier embedding provider pattern with automatic fallback: 1. OpenAI API (text-embedding-3-small) - if OPENAI_API_KEY set 2. FastEmbed local model (BAAI/bge-base-en-v1.5) - no API key needed 3. Placeholder hash-based embeddings - last resort fallback Key changes: - Create automem/embedding/ module with provider abstraction - provider.py: EmbeddingProvider base class - openai.py: OpenAI API provider (refactored from existing code) - fastembed.py: Local ONNX model provider (~210MB, 768-dim) - placeholder.py: Hash-based fallback (refactored) - Update app.py to use provider pattern with init_embedding_provider() - Add fastembed==0.4.2 dependency to requirements.txt - Add Docker volume for persistent model cache in docker-compose.yml - Document new EMBEDDING_PROVIDER configuration in CLAUDE.md and docs/ - Fix test compatibility issues (PayloadSchemaType import, cache invalidation) - Improve entity extraction regex patterns for project names Configuration: - EMBEDDING_PROVIDER=auto (default): Try OpenAI → FastEmbed → Placeholder - EMBEDDING_PROVIDER=openai: Use OpenAI only - EMBEDDING_PROVIDER=local: Use FastEmbed only - EMBEDDING_PROVIDER=placeholder: Use placeholder only Benefits: - Semantic search without API keys or internet (after initial download) - Cost savings (no OpenAI API costs) - 768-dim model compatible with existing Qdrant collections - Backward compatible with existing OpenAI deployments - Follows proven Codanna architecture pattern Tests: All passing (53 passed, 9 skipped)
bad9679 to
2998270
Compare
…er provider - Added B311 nosec comment to explain deterministic RNG is intentional - This addresses the final CodeRabbit code quality recommendation - Placeholder embeddings need reproducible vectors, not cryptographic randomness
- Test PlaceholderEmbeddingProvider determinism and dimensions - Test FastEmbedProvider with mocked model initialization - Test OpenAIEmbeddingProvider with mocked API client - Test provider selection logic (auto, explicit, fallback) - Test integration with app.py initialization - Test error handling and dimension mismatches - Test batch processing and provider features - Add end-to-end test for memory storage with providers All 33 new tests pass, bringing total to 86 passed + 9 skipped
Summary
This PR introduces a flexible embedding provider system that enables semantic search without requiring an OpenAI API key. It implements a three-tier fallback strategy with automatic provider selection, allowing AutoMem to work fully offline after an initial model download.
Motivation
Previously, AutoMem required either:
This limitation prevented users from having semantic search capabilities without API costs or internet connectivity.
Changes
New Embedding Provider Architecture
Created a modular provider system with automatic fallback:
Configuration
New environment variable
EMBEDDING_PROVIDERcontrols behavior:auto(default): Automatic fallback chainopenai: OpenAI only (fails if no API key)local: FastEmbed only (fails if model unavailable)placeholder: Hash-based onlyDocker Improvements
Added persistent volume for model cache to prevent re-downloading the 210MB model on container restarts:
Technical Details
~/.config/automem/models/Benefits
✅ Privacy-First: No data sent to external APIs
✅ Offline Operation: Works without internet after initial download
✅ Cost Savings: Eliminates OpenAI API costs
✅ Backward Compatible: Existing OpenAI deployments continue working unchanged
✅ Semantic Search: Real embeddings vs placeholder hashes
✅ Docker Optimized: Model persists across container lifecycle
Testing
Migration Guide
For existing deployments:
OPENAI_API_KEYis setEMBEDDING_PROVIDER=local(existing 768-dim vectors remain compatible)EMBEDDING_PROVIDER=openaiNote: The local model (BAAI/bge-base-en-v1.5) produces 768-dimensional embeddings, maintaining compatibility with existing Qdrant collections created with OpenAI's text-embedding-3-small.
Checklist
make testmake devQuestions for Reviewers
autoor explicitly require configuration?