Add local embedding provider with FastEmbed for offline semantic search #3

andrewleech · 2025-10-17T04:32:54Z

Summary

This PR introduces a flexible embedding provider system that enables semantic search without requiring an OpenAI API key. It implements a three-tier fallback strategy with automatic provider selection, allowing AutoMem to work fully offline after an initial model download.

Motivation

Previously, AutoMem required either:

An OpenAI API key for semantic embeddings (ongoing costs, network dependency)
Or fell back to hash-based placeholder embeddings (no semantic meaning)

This limitation prevented users from having semantic search capabilities without API costs or internet connectivity.

Changes

New Embedding Provider Architecture

Created a modular provider system with automatic fallback:

Priority 1: OpenAI (if OPENAI_API_KEY set)
    ↓
Priority 2: FastEmbed local model (no API key needed)
    ↓
Priority 3: Placeholder (deterministic hash, no semantic meaning)

Configuration

New environment variable EMBEDDING_PROVIDER controls behavior:

auto (default): Automatic fallback chain
openai: OpenAI only (fails if no API key)
local: FastEmbed only (fails if model unavailable)
placeholder: Hash-based only

Docker Improvements

Added persistent volume for model cache to prevent re-downloading the 210MB model on container restarts:

volumes:
  - fastembed_models:/root/.config/automem/models

Technical Details

Local Model: BAAI/bge-base-en-v1.5 (768 dimensions)
Model Size: ~210MB (automatically downloaded on first use)
Cache Location: ~/.config/automem/models/
Dimension Compatibility: 768-dim matches existing OpenAI/Qdrant setup
Performance: ONNX runtime provides fast CPU inference
Progress Display: Shows download progress on first initialization

Benefits

✅ Privacy-First: No data sent to external APIs
✅ Offline Operation: Works without internet after initial download
✅ Cost Savings: Eliminates OpenAI API costs
✅ Backward Compatible: Existing OpenAI deployments continue working unchanged
✅ Semantic Search: Real embeddings vs placeholder hashes
✅ Docker Optimized: Model persists across container lifecycle

Testing

# Test without OpenAI (will use local model)
unset OPENAI_API_KEY
export EMBEDDING_PROVIDER=auto
make dev

# First run shows:
# "Downloading BAAI/bge-base-en-v1.5 embedding model (~210MB, first time only)"

# Subsequent runs show:
# "Loading BAAI/bge-base-en-v1.5 from cache..."

# Store a memory to test embedding generation
curl -X POST http://localhost:8001/memory \
  -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"content": "Test semantic search with local embeddings"}'

# Test semantic recall
curl "http://localhost:8001/recall?query=local%20embeddings" \
  -H "Authorization: Bearer test-token"

Migration Guide

For existing deployments:

No changes required - Defaults to OpenAI if OPENAI_API_KEY is set
To switch to local: Set EMBEDDING_PROVIDER=local (existing 768-dim vectors remain compatible)
To force OpenAI: Set EMBEDDING_PROVIDER=openai

Note: The local model (BAAI/bge-base-en-v1.5) produces 768-dimensional embeddings, maintaining compatibility with existing Qdrant collections created with OpenAI's text-embedding-3-small.

Checklist

Code follows project style guidelines
Self-review completed
Documentation updated
No breaking changes for existing deployments
Docker configuration updated for optimal performance
Tests pass locally with make test
Integration tested with make dev

Questions for Reviewers

Should we add more local model options (384-dim for smaller size, 1024-dim for higher quality)?
Should the default remain auto or explicitly require configuration?
Any concerns about the 210MB model download for Docker deployments?

coderabbitai · 2025-10-17T04:33:11Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR introduces a multi-backend embedding provider architecture supporting OpenAI (remote API), FastEmbed (local ONNX), and Placeholder (hash-based) backends. Configuration via EMBEDDING_PROVIDER environment variable defaults to auto-selection with OpenAI → FastEmbed → Placeholder fallback order. All embedding operations now route through the provider abstraction.

Changes

Cohort / File(s)	Summary
Core embedding provider abstraction `automem/embedding/provider.py`	Introduces abstract base class `EmbeddingProvider` with interface for single/batch embedding generation, dimension, and provider name metadata.
Embedding provider implementations `automem/embedding/openai.py`, `automem/embedding/fastembed.py`, `automem/embedding/placeholder.py`	Three concrete implementations: `OpenAIEmbeddingProvider` (remote API), `FastEmbedProvider` (local ONNX models with caching), `PlaceholderEmbeddingProvider` (deterministic hash-based fallback).
Embedding package public API `automem/embedding/__init__.py`	Aggregates and re-exports all provider types for unified import surface.
Application integration `app.py`	Adds `embedding_provider` field to `ServiceState`, new `init_embedding_provider()` function for auto-selecting backends via environment variables, and routes all embedding generation calls through the provider with fallback to placeholder on failure.
Documentation `docs/ENVIRONMENT_VARIABLES.md`, `CLAUDE.md`	Updates embedding configuration documentation with new `EMBEDDING_PROVIDER` variable, backend descriptions, model details, and fallback behavior.
Configuration & persistence `docker-compose.yml`, `requirements.txt`	Adds named volume for FastEmbed model caching and adds `fastembed==0.4.2` dependency.

Sequence Diagram(s)

sequenceDiagram
    participant App as app.py<br/>(startup)
    participant Init as init_embedding_provider()
    participant Selector as Provider Selector
    participant OAI as OpenAIEmbeddingProvider
    participant FastE as FastEmbedProvider
    participant PH as PlaceholderEmbeddingProvider
    
    App->>Init: Call during startup
    Init->>Selector: Check EMBEDDING_PROVIDER env
    
    alt auto mode (default)
        Selector->>OAI: Try OpenAI (requires API key)
        alt OPENAI_API_KEY set
            OAI-->>Selector: ✓ Ready
        else key missing
            Selector->>FastE: Try FastEmbed (local)
            alt fastembed available
                FastE-->>Selector: ✓ Ready
            else not available
                Selector->>PH: Use Placeholder
                PH-->>Selector: ✓ Ready
            end
        end
    else explicit mode
        alt "openai"
            Selector->>OAI: Initialize
        else "local"
            Selector->>FastE: Initialize
        else "placeholder"
            Selector->>PH: Initialize
        end
    end
    
    Selector-->>Init: Return provider instance
    Init-->>App: Store in ServiceState

sequenceDiagram
    participant Client as API Client
    participant Handler as Endpoint Handler
    participant Provider as Active Provider<br/>(OpenAI/FastEmbed/...)
    participant Fallback as PlaceholderProvider
    
    Client->>Handler: Request with embedding
    Handler->>Provider: generate_embedding(text)
    
    alt Success
        Provider-->>Handler: [0.2, 0.5, ...]
    else Failure
        Provider--xHandler: Exception
        Handler->>Fallback: generate_embedding(text)
        Fallback-->>Handler: [hash-based vector]
    end
    
    Handler-->>Client: Response (with embedding)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes involve multiple interdependent components: a new abstract interface, three provider implementations with varying complexity (OpenAI API integration, local model caching/initialization, deterministic hashing), significant refactoring of embedding paths in app.py (provider selection, error handling, fallback logic), configuration management, and persistent storage setup. While individual provider implementations follow a consistent pattern, the heterogeneity of the implementations and the app.py integration logic require careful review of each component's correctness and interaction.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Add local embedding provider with FastEmbed for offline semantic search" is clear and specific, directly addressing the primary feature delivered by this PR: enabling offline semantic search through a local embedding provider. The title accurately captures the most significant user-facing change—adding FastEmbed as a concrete implementation enabling offline operation without API dependencies. While the title doesn't mention the broader architectural abstraction (the provider pattern and three-tier fallback system), these are supporting infrastructure rather than the primary objective. The phrasing is concise, readable, and free of vague terminology.
Description Check	✅ Passed	The PR description is comprehensive and directly related to the changeset. It provides clear articulation of motivation (semantic search without API costs or internet dependency), detailed technical specifications (model choice BAAI/bge-base-en-v1.5, 768 dimensions, ~210MB size, cache location), configuration details (EMBEDDING_PROVIDER environment variable options), concrete testing examples, and migration guidance for existing deployments. The description avoids vague language and grounds all claims in specific implementation details. All content directly supports understanding the changes introduced in the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (11)

CLAUDE.md (1)

133-138: Document runtime prerequisites and cache dir override

Add a note to pin onnxruntime to <1.20 when using fastembed==0.4.2 to avoid runtime issues. Based on learnings.

Consider documenting AUTOMEM_MODELS_DIR to override the default cache path for local models (useful when container user/home differs from /root). This keeps docker-compose volume paths and runtime cache in sync.

Also applies to: 188-191
automem/embedding/openai.py (1)
20-25: Add timeout/retry configuration to the OpenAI client

Default 10m timeout is too long for request threads; also make retries explicit. Expose via constructor args with sane defaults.
     def __init__(
         self,
         api_key: str,
         model: str = "text-embedding-3-small",
         dimension: int = 768,
+        timeout: float = 30.0,
+        max_retries: int = 2,
     ):
@@
-        self.client = OpenAI(api_key=api_key)
+        self.client = OpenAI(api_key=api_key, timeout=timeout, max_retries=max_retries)
Also applies to: 36-43
automem/embedding/fastembed.py (2)
101-117: Optional: add length checks on outputs

Mirror OpenAI checks to catch unexpected shape changes early; helps debug integration issues with Qdrant.
-        embeddings = list(self.model.embed([text]))
-        embedding = embeddings[0].tolist()
+        embeddings = list(self.model.embed([text]))
+        embedding = embeddings[0].tolist()
+        if len(embedding) != self._dimension:
+            raise ValueError(
+                f"fastembed embedding length {len(embedding)} != configured dimension {self._dimension} "
+                f"(model={self.model_name})"
+            )
@@
-        embeddings = [emb.tolist() for emb in self.model.embed(texts)]
+        embeddings = [emb.tolist() for emb in self.model.embed(texts)]
+        bad = next((i for i, e in enumerate(embeddings) if len(e) != self._dimension), None)
+        if bad is not None:
+            raise ValueError(
+                f"fastembed batch embedding length {len(embeddings[bad])} != configured dimension {self._dimension} "
+                f"at index {bad} (model={self.model_name})"
+            )
Also applies to: 118-140

101-109: Quality tweak (optional): BGE models work best with ‘passage:’/‘query:’ prefixes

Consider optionally prefixing stored documents with "passage: " (and queries with "query: ") to align with BGE training. Keep disabled by default for parity across providers.
requirements.txt (1)
10-10: Pin onnxruntime to a compatible range for fastembed 0.4.2

fastembed 0.4.2 recommends onnxruntime <1.20 to avoid runtime issues. Add an explicit pin to prevent breakages in fresh environments. Based on learnings.
 fastembed==0.4.2
+onnxruntime<1.20
docker-compose.yml (1)
36-47: Make model cache path robust and configurable; set provider in dev

The mount uses /root/.config/automem/models, which assumes the container runs as root. If the user changes, Path.home() differs and cache won’t persist. Either force the user to root or expose the cache dir via env and read it in code.

Also consider setting EMBEDDING_PROVIDER for local/offline dev.
   flask-api:
     build: .
@@
     environment:
       FLASK_ENV: development
       FLASK_DEBUG: "1"
       PORT: 8001
+      EMBEDDING_PROVIDER: ${EMBEDDING_PROVIDER:-auto}   # set to 'local' for offline dev
+      AUTOMEM_MODELS_DIR: /root/.config/automem/models  # keep in sync with volume path
@@
-    volumes:
+    volumes:
       - .:/app
-      - fastembed_models:/root/.config/automem/models  # Persist embedding models
+      - fastembed_models:/root/.config/automem/models  # Persist embedding models
+    # If you switch to a non-root user, update AUTOMEM_MODELS_DIR and mount path accordingly
@@
 volumes:
   falkordb_data:
   qdrant_data:
   fastembed_models:  # Persistent cache for local embedding models
Alternatively, explicitly set the container user to root to match the mounted path:
   flask-api:
     build: .
+    user: "0"  # ensure Path.home() == /root (dev only)
Also applies to: 55-56, 73-73
docs/ENVIRONMENT_VARIABLES.md (1)
50-70: Document operational caveats for the local FastEmbed backend.

Add notes so offline users don’t get tripped up:

Pin onnxruntime to <1.20 with fastembed 0.4.x to avoid runtime issues.

Persist the model cache in Docker (mount ~/.config/automem/models).

Ensure VECTOR_SIZE matches the model’s dimensions (e.g., 768 for bge-base-en-v1.5).

Apply this doc tweak:
@@
 **Local Model Details:**
 - Model: `BAAI/bge-base-en-v1.5` (768 dimensions)
 - Size: ~210MB (cached to `~/.config/automem/models/`)
 - No API key or internet required after first download
 - Good semantic quality, faster than API calls
+ - Recommended: pin `onnxruntime<1.20` with `fastembed` 0.4.x to avoid runtime issues
+ - Docker: persist model cache with a volume, e.g.:
+   - `~/.config/automem/models` inside the container
+   - docker-compose example: `fastembed_models:/root/.config/automem/models`
+ - Ensure `VECTOR_SIZE` equals the selected model’s dimension (default 768)
automem/embedding/placeholder.py (1)
36-39: Non-crypto RNG is fine here; silence the S311 linter.

This is intentionally non-cryptographic. Add a nosec to avoid noise.
-        rng = random.Random(seed)
+        rng = random.Random(seed)  # nosec B311 - deterministic, non-crypto usage
app.py (3)
3647-3660: Guard against provider dimension mismatches; fail soft to placeholder.

If a provider returns wrong dims, Qdrant upsert will fail. Add a defensive check.
 def _generate_real_embedding(content: str) -> List[float]:
@@
-    try:
-        embedding = state.embedding_provider.generate_embedding(content)
-        return embedding
+    try:
+        embedding = state.embedding_provider.generate_embedding(content)
+        if not isinstance(embedding, list) or len(embedding) != VECTOR_SIZE:
+            logger.warning(
+                "Provider %s returned %s dims (expected %d); falling back to placeholder",
+                state.embedding_provider.provider_name(),
+                len(embedding) if isinstance(embedding, list) else "invalid",
+                VECTOR_SIZE,
+            )
+            return _generate_placeholder_embedding(content)
+        return embedding
3663-3679: Batch: validate embedding shapes; fallback cleanly.

Mirror the single-embedding guard for batches.
 def _generate_real_embeddings_batch(contents: List[str]) -> List[List[float]]:
@@
-    try:
-        embeddings = state.embedding_provider.generate_embeddings_batch(contents)
-        return embeddings
+    try:
+        embeddings = state.embedding_provider.generate_embeddings_batch(contents)
+        if not embeddings or any(len(e) != VECTOR_SIZE for e in embeddings):
+            logger.warning(
+                "Provider %s returned invalid dims in batch; using placeholders",
+                state.embedding_provider.provider_name() if state.embedding_provider else "unknown",
+            )
+            return [_generate_placeholder_embedding(c) for c in contents]
+        return embeddings
2421-2429: Update docstring to reflect provider-based reembedding.

Route no longer requires OpenAI specifically; it uses the configured provider.
-"""Regenerate embeddings for existing memories using OpenAI API.
-
-Requires admin token and OpenAI API key configured.
+"""Regenerate embeddings for existing memories using the configured provider.
+
+Requires admin token and an available embedding provider (OpenAI, FastEmbed, or Placeholder).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3de00cc and 3b4dc3f.

📒 Files selected for processing (10)

CLAUDE.md (2 hunks)
app.py (6 hunks)
automem/embedding/__init__.py (1 hunks)
automem/embedding/fastembed.py (1 hunks)
automem/embedding/openai.py (1 hunks)
automem/embedding/placeholder.py (1 hunks)
automem/embedding/provider.py (1 hunks)
docker-compose.yml (2 hunks)
docs/ENVIRONMENT_VARIABLES.md (1 hunks)
requirements.txt (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

automem/embedding/placeholder.py (3)

automem/embedding/provider.py (5)

EmbeddingProvider (7-63)

dimension (45-51)

generate_embedding (15-27)

generate_embeddings_batch (30-42)

provider_name (54-60)

automem/embedding/fastembed.py (4)

dimension (141-147)

generate_embedding (101-116)

generate_embeddings_batch (118-139)

provider_name (149-155)

automem/embedding/openai.py (4)

dimension (90-96)

generate_embedding (45-62)

generate_embeddings_batch (64-88)

provider_name (98-104)

automem/embedding/__init__.py (4)

automem/embedding/provider.py (1)

EmbeddingProvider (7-63)

automem/embedding/openai.py (1)

OpenAIEmbeddingProvider (13-104)

automem/embedding/fastembed.py (1)

FastEmbedProvider (22-155)

automem/embedding/placeholder.py (1)

PlaceholderEmbeddingProvider (10-66)

automem/embedding/provider.py (3)

automem/embedding/fastembed.py (4)

generate_embedding (101-116)

generate_embeddings_batch (118-139)

dimension (141-147)

provider_name (149-155)

automem/embedding/openai.py (4)

generate_embedding (45-62)

generate_embeddings_batch (64-88)

dimension (90-96)

provider_name (98-104)

automem/embedding/placeholder.py (4)

generate_embedding (27-39)

generate_embeddings_batch (41-50)

dimension (52-58)

provider_name (60-66)

automem/embedding/fastembed.py (2)

automem/embedding/provider.py (5)

EmbeddingProvider (7-63)

dimension (45-51)

generate_embedding (15-27)

generate_embeddings_batch (30-42)

provider_name (54-60)

automem/embedding/openai.py (4)

dimension (90-96)

generate_embedding (45-62)

generate_embeddings_batch (64-88)

provider_name (98-104)

automem/embedding/openai.py (2)

tests/conftest.py (2)

OpenAI (110-112)

create (107-108)

automem/embedding/provider.py (5)

EmbeddingProvider (7-63)

dimension (45-51)

generate_embedding (15-27)

generate_embeddings_batch (30-42)

provider_name (54-60)

app.py (4)

automem/embedding/provider.py (5)

EmbeddingProvider (7-63)

dimension (45-51)

provider_name (54-60)

generate_embedding (15-27)

generate_embeddings_batch (30-42)

automem/embedding/openai.py (5)

OpenAIEmbeddingProvider (13-104)

dimension (90-96)

provider_name (98-104)

generate_embedding (45-62)

generate_embeddings_batch (64-88)

automem/embedding/fastembed.py (5)

FastEmbedProvider (22-155)

dimension (141-147)

provider_name (149-155)

generate_embedding (101-116)

generate_embeddings_batch (118-139)

automem/embedding/placeholder.py (5)

PlaceholderEmbeddingProvider (10-66)

dimension (52-58)

provider_name (60-66)

generate_embedding (27-39)

generate_embeddings_batch (41-50)

🪛 LanguageTool

docs/ENVIRONMENT_VARIABLES.md

[grammar] ~54-~54: There might be a mistake here.
Context: ...able | Description | Default | Options | |----------|-------------|---------|----...

(QB_NEW_EN)

[grammar] ~55-~55: There might be a mistake here.
Context: ...-----|-------------|---------|---------| | EMBEDDING_PROVIDER | Embedding backe...

(QB_NEW_EN)

[grammar] ~56-~56: There might be a mistake here.
Context: ...uto, openai, local, placeholder| |OPENAI_API_KEY` | OpenAI API key (for...

(QB_NEW_EN)

[grammar] ~60-~60: There might be a mistake here.
Context: ...AI → FastEmbed local model → Placeholder - openai: Use OpenAI API only (requires `OPENAI_...

(QB_NEW_EN)

[grammar] ~61-~61: There might be a mistake here.
Context: ...nAI API only (requires OPENAI_API_KEY) - local: Use FastEmbed local model only (~210MB...

(QB_NEW_EN)

[grammar] ~62-~62: There might be a mistake here.
Context: ...odel only (~210MB download on first use) - placeholder: Use hash-based embeddings (no semantic...

(QB_NEW_EN)

[grammar] ~65-~65: There might be a mistake here.
Context: ...semantic search) Local Model Details: - Model: BAAI/bge-base-en-v1.5 (768 dime...

(QB_NEW_EN)

[grammar] ~66-~66: There might be a mistake here.
Context: ...BAAI/bge-base-en-v1.5 (768 dimensions) - Size: ~~210MB (cached to `~~/.config/autom...

(QB_NEW_EN)

[grammar] ~67-~67: There might be a mistake here.
Context: ... (cached to ~/.config/automem/models/) - No API key or internet required after fi...

(QB_NEW_EN)

CLAUDE.md

[grammar] ~127-~127: There might be a mistake here.
Context: ... #### Provider Priority (Auto-Selection) 1. OpenAI (`openai:text-embedding-3-small...

(QB_NEW_EN)

[grammar] ~128-~128: There might be a mistake here.
Context: ...g-3-small) - If OPENAI_API_KEY` is set - High-quality semantic embeddings via API...

(QB_NEW_EN)

[grammar] ~129-~129: There might be a mistake here.
Context: ...High-quality semantic embeddings via API - Requires network and API costs - 768 ...

(QB_NEW_EN)

[grammar] ~130-~130: There might be a mistake here.
Context: ... API - Requires network and API costs - 768 dimensions (configurable via `VECTOR...

(QB_NEW_EN)

[grammar] ~133-~133: There might be a mistake here.
Context: ...AI/bge-base-en-v1.5`) - Local ONNX model - Good quality semantic embeddings - No...

(QB_NEW_EN)

[grammar] ~134-~134: There might be a mistake here.
Context: ...el - Good quality semantic embeddings - No API key or internet required (after f...

(QB_NEW_EN)

[grammar] ~135-~135: There might be a mistake here.
Context: ...internet required (after first download) - Downloads ~~210MB model to `~~/.config/aut...

(QB_NEW_EN)

[grammar] ~136-~136: There might be a mistake here.
Context: ...~/.config/automem/models/ on first use - 768 dimensions (default), also supports ...

(QB_NEW_EN)

[grammar] ~139-~139: There might be a mistake here.
Context: ...** (placeholder) - Hash-based fallback - Deterministic vectors from content hash ...

(QB_NEW_EN)

[grammar] ~140-~140: There might be a mistake here.
Context: ... Deterministic vectors from content hash - No semantic meaning, last resort only #...

(QB_NEW_EN)

[grammar] ~146-~146: There might be a mistake here.
Context: ...t): Try OpenAI → FastEmbed → Placeholder - openai: Use OpenAI only (fail if unavailable) ...

(QB_NEW_EN)

🪛 Ruff (0.14.0)

automem/embedding/placeholder.py

38-38: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

automem/embedding/__init__.py

14-19: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

app.py

1455-1455: Avoid specifying long messages outside the exception class

(TRY003)

1463-1463: Consider moving this statement to an else block

(TRY300)

1465-1465: Avoid specifying long messages outside the exception class

(TRY003)

1471-1471: Consider moving this statement to an else block

(TRY300)

1473-1473: Avoid specifying long messages outside the exception class

(TRY003)

1492-1492: Consider moving this statement to an else block

(TRY300)

1493-1493: Do not catch blind exception: Exception

(BLE001)

1500-1500: Consider moving this statement to an else block

(TRY300)

1501-1501: Do not catch blind exception: Exception

(BLE001)

1514-1517: Avoid specifying long messages outside the exception class

(TRY003)

3656-3656: Consider moving this statement to an else block

(TRY300)

3657-3657: Do not catch blind exception: Exception

(BLE001)

3675-3675: Consider moving this statement to an else block

(TRY300)

3676-3676: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)

automem/embedding/provider.py (1)

1-63: LGTM – clear ABC and API surface.

Interface is concise and consistent with backends.

coderabbitai · 2025-10-17T04:40:19Z

app.py

+# Import embedding providers
+from automem.embedding import (
+    EmbeddingProvider,
+    OpenAIEmbeddingProvider,
+    FastEmbedProvider,
+    PlaceholderEmbeddingProvider,
+)


⚠️ Potential issue | 🔴 Critical

Critical: import-time coupling to optional providers breaks “offline/no OpenAI” runs.

from automem.embedding import (...) re-exports import OpenAI/FastEmbed modules eagerly, requiring those packages even when unused. Combined with the top-level from openai import OpenAI (Line 35), this defeats the goal of running without OpenAI installed.

Fix by:

Importing only the interface at module import.

Doing branch-local lazy imports inside init_embedding_provider().

Guarding the top-level OpenAI import similar to spaCy.

Apply this diff for the provider imports:

-# Import embedding providers -from automem.embedding import ( - EmbeddingProvider, - OpenAIEmbeddingProvider, - FastEmbedProvider, - PlaceholderEmbeddingProvider, -) +# Import only the interface; import backends lazily in init_embedding_provider() +from automem.embedding.provider import EmbeddingProvider

Additionally, make the OpenAI client import optional (outside this hunk):

# near existing `from openai import OpenAI` try: from openai import OpenAI # type: ignore except Exception: # ImportError on machines without openai OpenAI = None # type: ignore

And in init_openai(), early-return if OpenAI is None. Based on learnings

🤖 Prompt for AI Agents

In app.py around lines 38 to 44, the current top-level imports re-export concrete embedding providers (OpenAIEmbeddingProvider, FastEmbedProvider, PlaceholderEmbeddingProvider) which forces optional packages to be installed and couples import-time behavior; change the file to import only the EmbeddingProvider interface at module import and move concrete provider imports into init_embedding_provider() as lazy, branch-local imports; also make the top-level OpenAI client import optional by wrapping it in a try/except that sets OpenAI = None, and modify init_openai() to early-return when OpenAI is None so runs can proceed without OpenAI installed.

coderabbitai · 2025-10-17T04:40:19Z

app.py

+def init_embedding_provider() -> None:
+    """Initialize embedding provider with auto-selection fallback.
+
+    Priority order:
+    1. OpenAI API (if OPENAI_API_KEY is set)
+    2. Local fastembed model (no API key needed)
+    3. Placeholder hash-based embeddings (fallback)
+
+    Can be controlled via EMBEDDING_PROVIDER env var:
+    - "auto" (default): Try OpenAI, then fastembed, then placeholder
+    - "openai": Use OpenAI only, fail if unavailable
+    - "local": Use fastembed only, fail if unavailable
+    - "placeholder": Use placeholder embeddings
+    """
+    if state.embedding_provider is not None:
+        return
+
+    provider_config = os.getenv("EMBEDDING_PROVIDER", "auto").lower()
+    vector_size = VECTOR_SIZE
+
+    # Explicit provider selection
+    if provider_config == "openai":
+        api_key = os.getenv("OPENAI_API_KEY")
+        if not api_key:
+            raise RuntimeError("EMBEDDING_PROVIDER=openai but OPENAI_API_KEY not set")
+        try:
+            state.embedding_provider = OpenAIEmbeddingProvider(
+                api_key=api_key,
+                model="text-embedding-3-small",
+                dimension=vector_size
+            )
+            logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
+            return
+        except Exception as e:
+            raise RuntimeError(f"Failed to initialize OpenAI provider: {e}") from e
+
+    elif provider_config == "local":
+        try:
+            state.embedding_provider = FastEmbedProvider(dimension=vector_size)
+            logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
+            return
+        except Exception as e:
+            raise RuntimeError(f"Failed to initialize local fastembed provider: {e}") from e
+
+    elif provider_config == "placeholder":
+        state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
+        logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
+        return
+
+    # Auto-selection: Try OpenAI → fastembed → placeholder
+    if provider_config == "auto":
+        # Try OpenAI first
+        api_key = os.getenv("OPENAI_API_KEY")
+        if api_key:
+            try:
+                state.embedding_provider = OpenAIEmbeddingProvider(
+                    api_key=api_key,
+                    model="text-embedding-3-small",
+                    dimension=vector_size
+                )
+                logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
+                return
+            except Exception as e:
+                logger.warning("Failed to initialize OpenAI provider, trying local model: %s", str(e))
+
+        # Try local fastembed
+        try:
+            state.embedding_provider = FastEmbedProvider(dimension=vector_size)
+            logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
+            return
+        except Exception as e:
+            logger.warning("Failed to initialize fastembed provider, using placeholder: %s", str(e))
+
+        # Fallback to placeholder
+        state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
+        logger.warning(
+            "Using placeholder embeddings (no semantic search). "
+            "Install fastembed or set OPENAI_API_KEY for semantic embeddings."
+        )
+        logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
+        return
+
+    # Invalid config
+    raise ValueError(
+        f"Invalid EMBEDDING_PROVIDER={provider_config}. "
+        f"Valid options: auto, openai, local, placeholder"
+    )
+


🛠️ Refactor suggestion | 🟠 Major

Make backends lazily imported per-branch and harden config parsing.

Avoid importing optional deps unless selected; minor robustness tweaks.

def init_embedding_provider() -> None: @@ - provider_config = os.getenv("EMBEDDING_PROVIDER", "auto").lower() + provider_config = (os.getenv("EMBEDDING_PROVIDER", "auto") or "auto").strip().lower() @@ if provider_config == "openai": api_key = os.getenv("OPENAI_API_KEY") if not api_key: raise RuntimeError("EMBEDDING_PROVIDER=openai but OPENAI_API_KEY not set") try: - state.embedding_provider = OpenAIEmbeddingProvider( + from automem.embedding.openai import OpenAIEmbeddingProvider + state.embedding_provider = OpenAIEmbeddingProvider( api_key=api_key, model="text-embedding-3-small", dimension=vector_size ) @@ elif provider_config == "local": try: - state.embedding_provider = FastEmbedProvider(dimension=vector_size) + from automem.embedding.fastembed import FastEmbedProvider + state.embedding_provider = FastEmbedProvider(dimension=vector_size) logger.info("Embedding provider: %s", state.embedding_provider.provider_name()) return except Exception as e: raise RuntimeError(f"Failed to initialize local fastembed provider: {e}") from e @@ elif provider_config == "placeholder": - state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size) + from automem.embedding.placeholder import PlaceholderEmbeddingProvider + state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size) logger.info("Embedding provider: %s", state.embedding_provider.provider_name()) return @@ if provider_config == "auto": # Try OpenAI first api_key = os.getenv("OPENAI_API_KEY") if api_key: try: - state.embedding_provider = OpenAIEmbeddingProvider( + from automem.embedding.openai import OpenAIEmbeddingProvider + state.embedding_provider = OpenAIEmbeddingProvider( api_key=api_key, model="text-embedding-3-small", dimension=vector_size ) logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name()) return except Exception as e: logger.warning("Failed to initialize OpenAI provider, trying local model: %s", str(e)) # Try local fastembed try: - state.embedding_provider = FastEmbedProvider(dimension=vector_size) + from automem.embedding.fastembed import FastEmbedProvider + state.embedding_provider = FastEmbedProvider(dimension=vector_size) logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name()) return except Exception as e: logger.warning("Failed to initialize fastembed provider, using placeholder: %s", str(e)) # Fallback to placeholder - state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size) + from automem.embedding.placeholder import PlaceholderEmbeddingProvider + state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)

🧰 Tools

🪛 Ruff (0.14.0)

1455-1455: Avoid specifying long messages outside the exception class

(TRY003)

1463-1463: Consider moving this statement to an else block

(TRY300)

1465-1465: Avoid specifying long messages outside the exception class

(TRY003)

1471-1471: Consider moving this statement to an else block

(TRY300)

1473-1473: Avoid specifying long messages outside the exception class

(TRY003)

1492-1492: Consider moving this statement to an else block

(TRY300)

1493-1493: Do not catch blind exception: Exception

(BLE001)

1500-1500: Consider moving this statement to an else block

(TRY300)

1501-1501: Do not catch blind exception: Exception

(BLE001)

1514-1517: Avoid specifying long messages outside the exception class

(TRY003)

coderabbitai · 2025-10-17T04:40:19Z

automem/embedding/__init__.py

+from automem.embedding.provider import EmbeddingProvider
+from automem.embedding.openai import OpenAIEmbeddingProvider
+from automem.embedding.fastembed import FastEmbedProvider
+from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+
+__all__ = [
+    "EmbeddingProvider",
+    "OpenAIEmbeddingProvider",
+    "FastEmbedProvider",
+    "PlaceholderEmbeddingProvider",
+]


⚠️ Potential issue | 🔴 Critical

Avoid importing optional backends at module import time (breaks offline installs).

Re-exporting OpenAI/FastEmbed here forces importing their packages even when unused. This contradicts “no OpenAI required” and can crash environments without those deps. Guard or defer imports.

Apply this safer pattern:

-from automem.embedding.provider import EmbeddingProvider -from automem.embedding.openai import OpenAIEmbeddingProvider -from automem.embedding.fastembed import FastEmbedProvider -from automem.embedding.placeholder import PlaceholderEmbeddingProvider +from .provider import EmbeddingProvider +# Optional backends: guard imports to avoid hard dependencies at import time +try: + from .openai import OpenAIEmbeddingProvider # type: ignore +except ImportError: + OpenAIEmbeddingProvider = None # type: ignore[assignment] +try: + from .fastembed import FastEmbedProvider # type: ignore +except ImportError: + FastEmbedProvider = None # type: ignore[assignment] +from .placeholder import PlaceholderEmbeddingProvider @@ -__all__ = [ - "EmbeddingProvider", - "OpenAIEmbeddingProvider", - "FastEmbedProvider", - "PlaceholderEmbeddingProvider", -] +__all__ = [ + "EmbeddingProvider", + "FastEmbedProvider", + "OpenAIEmbeddingProvider", + "PlaceholderEmbeddingProvider", +]

This also satisfies Ruff’s sorted all warning. As per coding guidelines

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from automem.embedding.provider import EmbeddingProvider

from automem.embedding.openai import OpenAIEmbeddingProvider

from automem.embedding.fastembed import FastEmbedProvider

from automem.embedding.placeholder import PlaceholderEmbeddingProvider

__all__ = [

"EmbeddingProvider",

"OpenAIEmbeddingProvider",

"FastEmbedProvider",

"PlaceholderEmbeddingProvider",

]

from .provider import EmbeddingProvider

# Optional backends: guard imports to avoid hard dependencies at import time

try:

from .openai import OpenAIEmbeddingProvider # type: ignore

except ImportError:

OpenAIEmbeddingProvider = None # type: ignore[assignment]

try:

from .fastembed import FastEmbedProvider # type: ignore

except ImportError:

FastEmbedProvider = None # type: ignore[assignment]

from .placeholder import PlaceholderEmbeddingProvider

__all__ = [

"EmbeddingProvider",

"FastEmbedProvider",

"OpenAIEmbeddingProvider",

"PlaceholderEmbeddingProvider",

]

🧰 Tools

🪛 Ruff (0.14.0)

14-19: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents

In automem/embedding/__init__.py around lines 9 to 19, the module currently imports optional backends (OpenAIEmbeddingProvider, FastEmbedProvider) at import time which breaks offline installs; change to lazy/guarded exports by removing top-level imports for optional providers, implement a __getattr__ that performs local imports on first attribute access (raising AttributeError if missing), and ensure __all__ is a sorted list of public names; this defers importing optional dependencies until actually used and resolves the Ruff sorted-__all__ warning.

coderabbitai · 2025-10-17T04:40:19Z

automem/embedding/fastembed.py

+        # Set cache directory
+        if cache_dir is None:
+            cache_dir = Path.home() / ".config" / "automem" / "models"
+        cache_dir.mkdir(parents=True, exist_ok=True)
+


🛠️ Refactor suggestion | 🟠 Major

Derive actual embedding dimension and reconcile with configured value

If model_name and dimension disagree, provider.dimension() will be wrong, causing vector-store mismatches. Detect the real size after init and align (or fail fast). Also allow overriding cache_dir via env var for container/home portability.

- # Set cache directory - if cache_dir is None: - cache_dir = Path.home() / ".config" / "automem" / "models" + # Set cache directory; allow env override for portability across users/containers + if cache_dir is None: + env_dir = os.getenv("AUTOMEM_MODELS_DIR") + cache_dir = Path(env_dir) if env_dir else (Path.home() / ".config" / "automem" / "models") cache_dir.mkdir(parents=True, exist_ok=True) @@ - self.model_name = model_name - self._dimension = dimension + self.model_name = model_name + # Derive actual embedding dimension to avoid mismatches + try: + _probe = next(self.model.embed([" "] )) + actual_dim = len(_probe) + except Exception: + # Fallback: assume requested dimension if probe fails (keeps behavior unchanged) + actual_dim = dimension + + if actual_dim != dimension: + logger.warning( + "fastembed actual dimension %d != configured %d for model %s. " + "Using actual dimension; ensure your VECTOR_SIZE/Qdrant collection matches.", + actual_dim, dimension, model_name + ) + self._dimension = actual_dim

Add required import at top:

-import logging +import logging +import os

Also applies to: 80-94

coderabbitai · 2025-10-17T04:40:19Z

automem/embedding/openai.py

+        response = self.client.embeddings.create(
+            input=text, model=self.model, dimensions=self._dimension
+        )
+        embedding = response.data[0].embedding
+        logger.debug("Generated OpenAI embedding for text (length: %d)", len(text))
+        return embedding
+


🛠️ Refactor suggestion | 🟠 Major

Enforce returned vector length equals configured dimension

Protect against silent mismatches (e.g., model ignores dimensions or config drift) by validating the returned length matches self._dimension and failing fast with a clear error. Prevents downstream Qdrant insert errors.

response = self.client.embeddings.create( input=text, model=self.model, dimensions=self._dimension ) - embedding = response.data[0].embedding + embedding = response.data[0].embedding + if len(embedding) != self._dimension: + raise ValueError( + f"OpenAI embedding length {len(embedding)} != configured dimension {self._dimension} " + f"(model={self.model}). Ensure 'dimensions' is supported and vector store size matches." + ) logger.debug("Generated OpenAI embedding for text (length: %d)", len(text)) return embedding @@ response = self.client.embeddings.create( input=texts, model=self.model, dimensions=self._dimension ) - embeddings = [item.embedding for item in response.data] + embeddings = [item.embedding for item in response.data] + bad = next((i for i, e in enumerate(embeddings) if len(e) != self._dimension), None) + if bad is not None: + raise ValueError( + f"OpenAI batch embedding length {len(embeddings[bad])} != configured dimension {self._dimension} " + f"at index {bad} (model={self.model})." + ) logger.info(

Also applies to: 79-88

🤖 Prompt for AI Agents

In automem/embedding/openai.py around lines 57 to 63 (and similarly for lines 79–88), the code currently returns the embedding without checking its length; add a validation step after extracting response.data[0].embedding that compares len(embedding) to self._dimension and if they differ raise a ValueError (or custom exception) with a clear message including expected and actual lengths, and log the mismatch before raising so callers fail fast and avoid downstream Qdrant insert errors.

Implements three-tier embedding provider pattern with automatic fallback: 1. OpenAI API (text-embedding-3-small) - if OPENAI_API_KEY set 2. FastEmbed local model (BAAI/bge-base-en-v1.5) - no API key needed 3. Placeholder hash-based embeddings - last resort fallback Key changes: - Create automem/embedding/ module with provider abstraction - provider.py: EmbeddingProvider base class - openai.py: OpenAI API provider (refactored from existing code) - fastembed.py: Local ONNX model provider (~210MB, 768-dim) - placeholder.py: Hash-based fallback (refactored) - Update app.py to use provider pattern with init_embedding_provider() - Add fastembed==0.4.2 dependency to requirements.txt - Add Docker volume for persistent model cache in docker-compose.yml - Document new EMBEDDING_PROVIDER configuration in CLAUDE.md and docs/ - Fix test compatibility issues (PayloadSchemaType import, cache invalidation) - Improve entity extraction regex patterns for project names Configuration: - EMBEDDING_PROVIDER=auto (default): Try OpenAI → FastEmbed → Placeholder - EMBEDDING_PROVIDER=openai: Use OpenAI only - EMBEDDING_PROVIDER=local: Use FastEmbed only - EMBEDDING_PROVIDER=placeholder: Use placeholder only Benefits: - Semantic search without API keys or internet (after initial download) - Cost savings (no OpenAI API costs) - 768-dim model compatible with existing Qdrant collections - Backward compatible with existing OpenAI deployments - Follows proven Codanna architecture pattern Tests: All passing (53 passed, 9 skipped)

…er provider - Added B311 nosec comment to explain deterministic RNG is intentional - This addresses the final CodeRabbit code quality recommendation - Placeholder embeddings need reproducible vectors, not cryptographic randomness

- Test PlaceholderEmbeddingProvider determinism and dimensions - Test FastEmbedProvider with mocked model initialization - Test OpenAIEmbeddingProvider with mocked API client - Test provider selection logic (auto, explicit, fallback) - Test integration with app.py initialization - Test error handling and dimension mismatches - Test batch processing and provider features - Add end-to-end test for memory storage with providers All 33 new tests pass, bringing total to 86 passed + 9 skipped

andrewleech marked this pull request as draft October 17, 2025 04:37

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

andrewleech force-pushed the feat/local-embedding-provider branch from bad9679 to 2998270 Compare October 17, 2025 05:05

pi-anl added 2 commits October 17, 2025 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add local embedding provider with FastEmbed for offline semantic search #3

Add local embedding provider with FastEmbed for offline semantic search #3

andrewleech commented Oct 17, 2025

Uh oh!

coderabbitai bot commented Oct 17, 2025 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 17, 2025

Uh oh!

coderabbitai bot Oct 17, 2025

Uh oh!

coderabbitai bot Oct 17, 2025

Uh oh!

coderabbitai bot Oct 17, 2025

Uh oh!

coderabbitai bot Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add local embedding provider with FastEmbed for offline semantic search #3

Are you sure you want to change the base?

Add local embedding provider with FastEmbed for offline semantic search #3

Conversation

andrewleech commented Oct 17, 2025

Summary

Motivation

Changes

New Embedding Provider Architecture

Configuration

Docker Improvements

Technical Details

Benefits

Testing

Migration Guide

Checklist

Questions for Reviewers

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Oct 17, 2025 •

edited

Loading