Skip to content

Conversation

@andrewleech
Copy link

Summary

This PR introduces a flexible embedding provider system that enables semantic search without requiring an OpenAI API key. It implements a three-tier fallback strategy with automatic provider selection, allowing AutoMem to work fully offline after an initial model download.

Motivation

Previously, AutoMem required either:

  • An OpenAI API key for semantic embeddings (ongoing costs, network dependency)
  • Or fell back to hash-based placeholder embeddings (no semantic meaning)

This limitation prevented users from having semantic search capabilities without API costs or internet connectivity.

Changes

New Embedding Provider Architecture

Created a modular provider system with automatic fallback:

Priority 1: OpenAI (if OPENAI_API_KEY set)
    ↓
Priority 2: FastEmbed local model (no API key needed)
    ↓
Priority 3: Placeholder (deterministic hash, no semantic meaning)

Configuration

New environment variable EMBEDDING_PROVIDER controls behavior:

  • auto (default): Automatic fallback chain
  • openai: OpenAI only (fails if no API key)
  • local: FastEmbed only (fails if model unavailable)
  • placeholder: Hash-based only

Docker Improvements

Added persistent volume for model cache to prevent re-downloading the 210MB model on container restarts:

volumes:
  - fastembed_models:/root/.config/automem/models

Technical Details

  • Local Model: BAAI/bge-base-en-v1.5 (768 dimensions)
  • Model Size: ~210MB (automatically downloaded on first use)
  • Cache Location: ~/.config/automem/models/
  • Dimension Compatibility: 768-dim matches existing OpenAI/Qdrant setup
  • Performance: ONNX runtime provides fast CPU inference
  • Progress Display: Shows download progress on first initialization

Benefits

Privacy-First: No data sent to external APIs
Offline Operation: Works without internet after initial download
Cost Savings: Eliminates OpenAI API costs
Backward Compatible: Existing OpenAI deployments continue working unchanged
Semantic Search: Real embeddings vs placeholder hashes
Docker Optimized: Model persists across container lifecycle

Testing

# Test without OpenAI (will use local model)
unset OPENAI_API_KEY
export EMBEDDING_PROVIDER=auto
make dev

# First run shows:
# "Downloading BAAI/bge-base-en-v1.5 embedding model (~210MB, first time only)"

# Subsequent runs show:
# "Loading BAAI/bge-base-en-v1.5 from cache..."

# Store a memory to test embedding generation
curl -X POST http://localhost:8001/memory \
  -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"content": "Test semantic search with local embeddings"}'

# Test semantic recall
curl "http://localhost:8001/recall?query=local%20embeddings" \
  -H "Authorization: Bearer test-token"

Migration Guide

For existing deployments:

  1. No changes required - Defaults to OpenAI if OPENAI_API_KEY is set
  2. To switch to local: Set EMBEDDING_PROVIDER=local (existing 768-dim vectors remain compatible)
  3. To force OpenAI: Set EMBEDDING_PROVIDER=openai

Note: The local model (BAAI/bge-base-en-v1.5) produces 768-dimensional embeddings, maintaining compatibility with existing Qdrant collections created with OpenAI's text-embedding-3-small.

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated
  • No breaking changes for existing deployments
  • Docker configuration updated for optimal performance
  • Tests pass locally with make test
  • Integration tested with make dev

Questions for Reviewers

  1. Should we add more local model options (384-dim for smaller size, 1024-dim for higher quality)?
  2. Should the default remain auto or explicitly require configuration?
  3. Any concerns about the 210MB model download for Docker deployments?

@coderabbitai
Copy link

coderabbitai bot commented Oct 17, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR introduces a multi-backend embedding provider architecture supporting OpenAI (remote API), FastEmbed (local ONNX), and Placeholder (hash-based) backends. Configuration via EMBEDDING_PROVIDER environment variable defaults to auto-selection with OpenAI → FastEmbed → Placeholder fallback order. All embedding operations now route through the provider abstraction.

Changes

Cohort / File(s) Summary
Core embedding provider abstraction
automem/embedding/provider.py
Introduces abstract base class EmbeddingProvider with interface for single/batch embedding generation, dimension, and provider name metadata.
Embedding provider implementations
automem/embedding/openai.py, automem/embedding/fastembed.py, automem/embedding/placeholder.py
Three concrete implementations: OpenAIEmbeddingProvider (remote API), FastEmbedProvider (local ONNX models with caching), PlaceholderEmbeddingProvider (deterministic hash-based fallback).
Embedding package public API
automem/embedding/__init__.py
Aggregates and re-exports all provider types for unified import surface.
Application integration
app.py
Adds embedding_provider field to ServiceState, new init_embedding_provider() function for auto-selecting backends via environment variables, and routes all embedding generation calls through the provider with fallback to placeholder on failure.
Documentation
docs/ENVIRONMENT_VARIABLES.md, CLAUDE.md
Updates embedding configuration documentation with new EMBEDDING_PROVIDER variable, backend descriptions, model details, and fallback behavior.
Configuration & persistence
docker-compose.yml, requirements.txt
Adds named volume for FastEmbed model caching and adds fastembed==0.4.2 dependency.

Sequence Diagram(s)

sequenceDiagram
    participant App as app.py<br/>(startup)
    participant Init as init_embedding_provider()
    participant Selector as Provider Selector
    participant OAI as OpenAIEmbeddingProvider
    participant FastE as FastEmbedProvider
    participant PH as PlaceholderEmbeddingProvider
    
    App->>Init: Call during startup
    Init->>Selector: Check EMBEDDING_PROVIDER env
    
    alt auto mode (default)
        Selector->>OAI: Try OpenAI (requires API key)
        alt OPENAI_API_KEY set
            OAI-->>Selector: ✓ Ready
        else key missing
            Selector->>FastE: Try FastEmbed (local)
            alt fastembed available
                FastE-->>Selector: ✓ Ready
            else not available
                Selector->>PH: Use Placeholder
                PH-->>Selector: ✓ Ready
            end
        end
    else explicit mode
        alt "openai"
            Selector->>OAI: Initialize
        else "local"
            Selector->>FastE: Initialize
        else "placeholder"
            Selector->>PH: Initialize
        end
    end
    
    Selector-->>Init: Return provider instance
    Init-->>App: Store in ServiceState
Loading
sequenceDiagram
    participant Client as API Client
    participant Handler as Endpoint Handler
    participant Provider as Active Provider<br/>(OpenAI/FastEmbed/...)
    participant Fallback as PlaceholderProvider
    
    Client->>Handler: Request with embedding
    Handler->>Provider: generate_embedding(text)
    
    alt Success
        Provider-->>Handler: [0.2, 0.5, ...]
    else Failure
        Provider--xHandler: Exception
        Handler->>Fallback: generate_embedding(text)
        Fallback-->>Handler: [hash-based vector]
    end
    
    Handler-->>Client: Response (with embedding)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes involve multiple interdependent components: a new abstract interface, three provider implementations with varying complexity (OpenAI API integration, local model caching/initialization, deterministic hashing), significant refactoring of embedding paths in app.py (provider selection, error handling, fallback logic), configuration management, and persistent storage setup. While individual provider implementations follow a consistent pattern, the heterogeneity of the implementations and the app.py integration logic require careful review of each component's correctness and interaction.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Add local embedding provider with FastEmbed for offline semantic search" is clear and specific, directly addressing the primary feature delivered by this PR: enabling offline semantic search through a local embedding provider. The title accurately captures the most significant user-facing change—adding FastEmbed as a concrete implementation enabling offline operation without API dependencies. While the title doesn't mention the broader architectural abstraction (the provider pattern and three-tier fallback system), these are supporting infrastructure rather than the primary objective. The phrasing is concise, readable, and free of vague terminology.
Description Check ✅ Passed The PR description is comprehensive and directly related to the changeset. It provides clear articulation of motivation (semantic search without API costs or internet dependency), detailed technical specifications (model choice BAAI/bge-base-en-v1.5, 768 dimensions, ~210MB size, cache location), configuration details (EMBEDDING_PROVIDER environment variable options), concrete testing examples, and migration guidance for existing deployments. The description avoids vague language and grounds all claims in specific implementation details. All content directly supports understanding the changes introduced in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@andrewleech andrewleech marked this pull request as draft October 17, 2025 04:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (11)
CLAUDE.md (1)

133-138: Document runtime prerequisites and cache dir override

  • Add a note to pin onnxruntime to <1.20 when using fastembed==0.4.2 to avoid runtime issues. Based on learnings.
  • Consider documenting AUTOMEM_MODELS_DIR to override the default cache path for local models (useful when container user/home differs from /root). This keeps docker-compose volume paths and runtime cache in sync.

Also applies to: 188-191

automem/embedding/openai.py (1)

20-25: Add timeout/retry configuration to the OpenAI client

Default 10m timeout is too long for request threads; also make retries explicit. Expose via constructor args with sane defaults.

     def __init__(
         self,
         api_key: str,
         model: str = "text-embedding-3-small",
         dimension: int = 768,
+        timeout: float = 30.0,
+        max_retries: int = 2,
     ):
@@
-        self.client = OpenAI(api_key=api_key)
+        self.client = OpenAI(api_key=api_key, timeout=timeout, max_retries=max_retries)

Also applies to: 36-43

automem/embedding/fastembed.py (2)

101-117: Optional: add length checks on outputs

Mirror OpenAI checks to catch unexpected shape changes early; helps debug integration issues with Qdrant.

-        embeddings = list(self.model.embed([text]))
-        embedding = embeddings[0].tolist()
+        embeddings = list(self.model.embed([text]))
+        embedding = embeddings[0].tolist()
+        if len(embedding) != self._dimension:
+            raise ValueError(
+                f"fastembed embedding length {len(embedding)} != configured dimension {self._dimension} "
+                f"(model={self.model_name})"
+            )
@@
-        embeddings = [emb.tolist() for emb in self.model.embed(texts)]
+        embeddings = [emb.tolist() for emb in self.model.embed(texts)]
+        bad = next((i for i, e in enumerate(embeddings) if len(e) != self._dimension), None)
+        if bad is not None:
+            raise ValueError(
+                f"fastembed batch embedding length {len(embeddings[bad])} != configured dimension {self._dimension} "
+                f"at index {bad} (model={self.model_name})"
+            )

Also applies to: 118-140


101-109: Quality tweak (optional): BGE models work best with ‘passage:’/‘query:’ prefixes

Consider optionally prefixing stored documents with "passage: " (and queries with "query: ") to align with BGE training. Keep disabled by default for parity across providers.

requirements.txt (1)

10-10: Pin onnxruntime to a compatible range for fastembed 0.4.2

fastembed 0.4.2 recommends onnxruntime <1.20 to avoid runtime issues. Add an explicit pin to prevent breakages in fresh environments. Based on learnings.

 fastembed==0.4.2
+onnxruntime<1.20
docker-compose.yml (1)

36-47: Make model cache path robust and configurable; set provider in dev

  • The mount uses /root/.config/automem/models, which assumes the container runs as root. If the user changes, Path.home() differs and cache won’t persist. Either force the user to root or expose the cache dir via env and read it in code.
  • Also consider setting EMBEDDING_PROVIDER for local/offline dev.
   flask-api:
     build: .
@@
     environment:
       FLASK_ENV: development
       FLASK_DEBUG: "1"
       PORT: 8001
+      EMBEDDING_PROVIDER: ${EMBEDDING_PROVIDER:-auto}   # set to 'local' for offline dev
+      AUTOMEM_MODELS_DIR: /root/.config/automem/models  # keep in sync with volume path
@@
-    volumes:
+    volumes:
       - .:/app
-      - fastembed_models:/root/.config/automem/models  # Persist embedding models
+      - fastembed_models:/root/.config/automem/models  # Persist embedding models
+    # If you switch to a non-root user, update AUTOMEM_MODELS_DIR and mount path accordingly
@@
 volumes:
   falkordb_data:
   qdrant_data:
   fastembed_models:  # Persistent cache for local embedding models

Alternatively, explicitly set the container user to root to match the mounted path:

   flask-api:
     build: .
+    user: "0"  # ensure Path.home() == /root (dev only)

Also applies to: 55-56, 73-73

docs/ENVIRONMENT_VARIABLES.md (1)

50-70: Document operational caveats for the local FastEmbed backend.

Add notes so offline users don’t get tripped up:

  • Pin onnxruntime to <1.20 with fastembed 0.4.x to avoid runtime issues.
  • Persist the model cache in Docker (mount ~/.config/automem/models).
  • Ensure VECTOR_SIZE matches the model’s dimensions (e.g., 768 for bge-base-en-v1.5).

Apply this doc tweak:

@@
 **Local Model Details:**
 - Model: `BAAI/bge-base-en-v1.5` (768 dimensions)
 - Size: ~210MB (cached to `~/.config/automem/models/`)
 - No API key or internet required after first download
 - Good semantic quality, faster than API calls
+ - Recommended: pin `onnxruntime<1.20` with `fastembed` 0.4.x to avoid runtime issues
+ - Docker: persist model cache with a volume, e.g.:
+   - `~/.config/automem/models` inside the container
+   - docker-compose example: `fastembed_models:/root/.config/automem/models`
+ - Ensure `VECTOR_SIZE` equals the selected model’s dimension (default 768)
automem/embedding/placeholder.py (1)

36-39: Non-crypto RNG is fine here; silence the S311 linter.

This is intentionally non-cryptographic. Add a nosec to avoid noise.

-        rng = random.Random(seed)
+        rng = random.Random(seed)  # nosec B311 - deterministic, non-crypto usage
app.py (3)

3647-3660: Guard against provider dimension mismatches; fail soft to placeholder.

If a provider returns wrong dims, Qdrant upsert will fail. Add a defensive check.

 def _generate_real_embedding(content: str) -> List[float]:
@@
-    try:
-        embedding = state.embedding_provider.generate_embedding(content)
-        return embedding
+    try:
+        embedding = state.embedding_provider.generate_embedding(content)
+        if not isinstance(embedding, list) or len(embedding) != VECTOR_SIZE:
+            logger.warning(
+                "Provider %s returned %s dims (expected %d); falling back to placeholder",
+                state.embedding_provider.provider_name(),
+                len(embedding) if isinstance(embedding, list) else "invalid",
+                VECTOR_SIZE,
+            )
+            return _generate_placeholder_embedding(content)
+        return embedding

3663-3679: Batch: validate embedding shapes; fallback cleanly.

Mirror the single-embedding guard for batches.

 def _generate_real_embeddings_batch(contents: List[str]) -> List[List[float]]:
@@
-    try:
-        embeddings = state.embedding_provider.generate_embeddings_batch(contents)
-        return embeddings
+    try:
+        embeddings = state.embedding_provider.generate_embeddings_batch(contents)
+        if not embeddings or any(len(e) != VECTOR_SIZE for e in embeddings):
+            logger.warning(
+                "Provider %s returned invalid dims in batch; using placeholders",
+                state.embedding_provider.provider_name() if state.embedding_provider else "unknown",
+            )
+            return [_generate_placeholder_embedding(c) for c in contents]
+        return embeddings

2421-2429: Update docstring to reflect provider-based reembedding.

Route no longer requires OpenAI specifically; it uses the configured provider.

-"""Regenerate embeddings for existing memories using OpenAI API.
-
-Requires admin token and OpenAI API key configured.
+"""Regenerate embeddings for existing memories using the configured provider.
+
+Requires admin token and an available embedding provider (OpenAI, FastEmbed, or Placeholder).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3de00cc and 3b4dc3f.

📒 Files selected for processing (10)
  • CLAUDE.md (2 hunks)
  • app.py (6 hunks)
  • automem/embedding/__init__.py (1 hunks)
  • automem/embedding/fastembed.py (1 hunks)
  • automem/embedding/openai.py (1 hunks)
  • automem/embedding/placeholder.py (1 hunks)
  • automem/embedding/provider.py (1 hunks)
  • docker-compose.yml (2 hunks)
  • docs/ENVIRONMENT_VARIABLES.md (1 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
automem/embedding/placeholder.py (3)
automem/embedding/provider.py (5)
  • EmbeddingProvider (7-63)
  • dimension (45-51)
  • generate_embedding (15-27)
  • generate_embeddings_batch (30-42)
  • provider_name (54-60)
automem/embedding/fastembed.py (4)
  • dimension (141-147)
  • generate_embedding (101-116)
  • generate_embeddings_batch (118-139)
  • provider_name (149-155)
automem/embedding/openai.py (4)
  • dimension (90-96)
  • generate_embedding (45-62)
  • generate_embeddings_batch (64-88)
  • provider_name (98-104)
automem/embedding/__init__.py (4)
automem/embedding/provider.py (1)
  • EmbeddingProvider (7-63)
automem/embedding/openai.py (1)
  • OpenAIEmbeddingProvider (13-104)
automem/embedding/fastembed.py (1)
  • FastEmbedProvider (22-155)
automem/embedding/placeholder.py (1)
  • PlaceholderEmbeddingProvider (10-66)
automem/embedding/provider.py (3)
automem/embedding/fastembed.py (4)
  • generate_embedding (101-116)
  • generate_embeddings_batch (118-139)
  • dimension (141-147)
  • provider_name (149-155)
automem/embedding/openai.py (4)
  • generate_embedding (45-62)
  • generate_embeddings_batch (64-88)
  • dimension (90-96)
  • provider_name (98-104)
automem/embedding/placeholder.py (4)
  • generate_embedding (27-39)
  • generate_embeddings_batch (41-50)
  • dimension (52-58)
  • provider_name (60-66)
automem/embedding/fastembed.py (2)
automem/embedding/provider.py (5)
  • EmbeddingProvider (7-63)
  • dimension (45-51)
  • generate_embedding (15-27)
  • generate_embeddings_batch (30-42)
  • provider_name (54-60)
automem/embedding/openai.py (4)
  • dimension (90-96)
  • generate_embedding (45-62)
  • generate_embeddings_batch (64-88)
  • provider_name (98-104)
automem/embedding/openai.py (2)
tests/conftest.py (2)
  • OpenAI (110-112)
  • create (107-108)
automem/embedding/provider.py (5)
  • EmbeddingProvider (7-63)
  • dimension (45-51)
  • generate_embedding (15-27)
  • generate_embeddings_batch (30-42)
  • provider_name (54-60)
app.py (4)
automem/embedding/provider.py (5)
  • EmbeddingProvider (7-63)
  • dimension (45-51)
  • provider_name (54-60)
  • generate_embedding (15-27)
  • generate_embeddings_batch (30-42)
automem/embedding/openai.py (5)
  • OpenAIEmbeddingProvider (13-104)
  • dimension (90-96)
  • provider_name (98-104)
  • generate_embedding (45-62)
  • generate_embeddings_batch (64-88)
automem/embedding/fastembed.py (5)
  • FastEmbedProvider (22-155)
  • dimension (141-147)
  • provider_name (149-155)
  • generate_embedding (101-116)
  • generate_embeddings_batch (118-139)
automem/embedding/placeholder.py (5)
  • PlaceholderEmbeddingProvider (10-66)
  • dimension (52-58)
  • provider_name (60-66)
  • generate_embedding (27-39)
  • generate_embeddings_batch (41-50)
🪛 LanguageTool
docs/ENVIRONMENT_VARIABLES.md

[grammar] ~54-~54: There might be a mistake here.
Context: ...able | Description | Default | Options | |----------|-------------|---------|----...

(QB_NEW_EN)


[grammar] ~55-~55: There might be a mistake here.
Context: ...-----|-------------|---------|---------| | EMBEDDING_PROVIDER | Embedding backe...

(QB_NEW_EN)


[grammar] ~56-~56: There might be a mistake here.
Context: ...uto, openai, local, placeholder| |OPENAI_API_KEY` | OpenAI API key (for...

(QB_NEW_EN)


[grammar] ~60-~60: There might be a mistake here.
Context: ...AI → FastEmbed local model → Placeholder - openai: Use OpenAI API only (requires `OPENAI_...

(QB_NEW_EN)


[grammar] ~61-~61: There might be a mistake here.
Context: ...nAI API only (requires OPENAI_API_KEY) - local: Use FastEmbed local model only (~210MB...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...odel only (~210MB download on first use) - placeholder: Use hash-based embeddings (no semantic...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...semantic search) Local Model Details: - Model: BAAI/bge-base-en-v1.5 (768 dime...

(QB_NEW_EN)


[grammar] ~66-~66: There might be a mistake here.
Context: ...BAAI/bge-base-en-v1.5 (768 dimensions) - Size: 210MB (cached to `/.config/autom...

(QB_NEW_EN)


[grammar] ~67-~67: There might be a mistake here.
Context: ... (cached to ~/.config/automem/models/) - No API key or internet required after fi...

(QB_NEW_EN)

CLAUDE.md

[grammar] ~127-~127: There might be a mistake here.
Context: ... #### Provider Priority (Auto-Selection) 1. OpenAI (`openai:text-embedding-3-small...

(QB_NEW_EN)


[grammar] ~128-~128: There might be a mistake here.
Context: ...g-3-small) - If OPENAI_API_KEY` is set - High-quality semantic embeddings via API...

(QB_NEW_EN)


[grammar] ~129-~129: There might be a mistake here.
Context: ...High-quality semantic embeddings via API - Requires network and API costs - 768 ...

(QB_NEW_EN)


[grammar] ~130-~130: There might be a mistake here.
Context: ... API - Requires network and API costs - 768 dimensions (configurable via `VECTOR...

(QB_NEW_EN)


[grammar] ~133-~133: There might be a mistake here.
Context: ...AI/bge-base-en-v1.5`) - Local ONNX model - Good quality semantic embeddings - No...

(QB_NEW_EN)


[grammar] ~134-~134: There might be a mistake here.
Context: ...el - Good quality semantic embeddings - No API key or internet required (after f...

(QB_NEW_EN)


[grammar] ~135-~135: There might be a mistake here.
Context: ...internet required (after first download) - Downloads 210MB model to `/.config/aut...

(QB_NEW_EN)


[grammar] ~136-~136: There might be a mistake here.
Context: ...~/.config/automem/models/ on first use - 768 dimensions (default), also supports ...

(QB_NEW_EN)


[grammar] ~139-~139: There might be a mistake here.
Context: ...** (placeholder) - Hash-based fallback - Deterministic vectors from content hash ...

(QB_NEW_EN)


[grammar] ~140-~140: There might be a mistake here.
Context: ... Deterministic vectors from content hash - No semantic meaning, last resort only #...

(QB_NEW_EN)


[grammar] ~146-~146: There might be a mistake here.
Context: ...t): Try OpenAI → FastEmbed → Placeholder - openai: Use OpenAI only (fail if unavailable) ...

(QB_NEW_EN)

🪛 Ruff (0.14.0)
automem/embedding/placeholder.py

38-38: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

automem/embedding/__init__.py

14-19: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

app.py

1455-1455: Avoid specifying long messages outside the exception class

(TRY003)


1463-1463: Consider moving this statement to an else block

(TRY300)


1465-1465: Avoid specifying long messages outside the exception class

(TRY003)


1471-1471: Consider moving this statement to an else block

(TRY300)


1473-1473: Avoid specifying long messages outside the exception class

(TRY003)


1492-1492: Consider moving this statement to an else block

(TRY300)


1493-1493: Do not catch blind exception: Exception

(BLE001)


1500-1500: Consider moving this statement to an else block

(TRY300)


1501-1501: Do not catch blind exception: Exception

(BLE001)


1514-1517: Avoid specifying long messages outside the exception class

(TRY003)


3656-3656: Consider moving this statement to an else block

(TRY300)


3657-3657: Do not catch blind exception: Exception

(BLE001)


3675-3675: Consider moving this statement to an else block

(TRY300)


3676-3676: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)
automem/embedding/provider.py (1)

1-63: LGTM – clear ABC and API surface.

Interface is concise and consistent with backends.

app.py Outdated
Comment on lines 38 to 50
# Import embedding providers
from automem.embedding import (
EmbeddingProvider,
OpenAIEmbeddingProvider,
FastEmbedProvider,
PlaceholderEmbeddingProvider,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: import-time coupling to optional providers breaks “offline/no OpenAI” runs.

from automem.embedding import (...) re-exports import OpenAI/FastEmbed modules eagerly, requiring those packages even when unused. Combined with the top-level from openai import OpenAI (Line 35), this defeats the goal of running without OpenAI installed.

Fix by:

  • Importing only the interface at module import.
  • Doing branch-local lazy imports inside init_embedding_provider().
  • Guarding the top-level OpenAI import similar to spaCy.

Apply this diff for the provider imports:

-# Import embedding providers
-from automem.embedding import (
-    EmbeddingProvider,
-    OpenAIEmbeddingProvider,
-    FastEmbedProvider,
-    PlaceholderEmbeddingProvider,
-)
+# Import only the interface; import backends lazily in init_embedding_provider()
+from automem.embedding.provider import EmbeddingProvider

Additionally, make the OpenAI client import optional (outside this hunk):

# near existing `from openai import OpenAI`
try:
    from openai import OpenAI  # type: ignore
except Exception:  # ImportError on machines without openai
    OpenAI = None  # type: ignore

And in init_openai(), early-return if OpenAI is None. Based on learnings

🤖 Prompt for AI Agents
In app.py around lines 38 to 44, the current top-level imports re-export
concrete embedding providers (OpenAIEmbeddingProvider, FastEmbedProvider,
PlaceholderEmbeddingProvider) which forces optional packages to be installed and
couples import-time behavior; change the file to import only the
EmbeddingProvider interface at module import and move concrete provider imports
into init_embedding_provider() as lazy, branch-local imports; also make the
top-level OpenAI client import optional by wrapping it in a try/except that sets
OpenAI = None, and modify init_openai() to early-return when OpenAI is None so
runs can proceed without OpenAI installed.

Comment on lines 1431 to 1530
def init_embedding_provider() -> None:
"""Initialize embedding provider with auto-selection fallback.
Priority order:
1. OpenAI API (if OPENAI_API_KEY is set)
2. Local fastembed model (no API key needed)
3. Placeholder hash-based embeddings (fallback)
Can be controlled via EMBEDDING_PROVIDER env var:
- "auto" (default): Try OpenAI, then fastembed, then placeholder
- "openai": Use OpenAI only, fail if unavailable
- "local": Use fastembed only, fail if unavailable
- "placeholder": Use placeholder embeddings
"""
if state.embedding_provider is not None:
return

provider_config = os.getenv("EMBEDDING_PROVIDER", "auto").lower()
vector_size = VECTOR_SIZE

# Explicit provider selection
if provider_config == "openai":
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise RuntimeError("EMBEDDING_PROVIDER=openai but OPENAI_API_KEY not set")
try:
state.embedding_provider = OpenAIEmbeddingProvider(
api_key=api_key,
model="text-embedding-3-small",
dimension=vector_size
)
logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
return
except Exception as e:
raise RuntimeError(f"Failed to initialize OpenAI provider: {e}") from e

elif provider_config == "local":
try:
state.embedding_provider = FastEmbedProvider(dimension=vector_size)
logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
return
except Exception as e:
raise RuntimeError(f"Failed to initialize local fastembed provider: {e}") from e

elif provider_config == "placeholder":
state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
return

# Auto-selection: Try OpenAI → fastembed → placeholder
if provider_config == "auto":
# Try OpenAI first
api_key = os.getenv("OPENAI_API_KEY")
if api_key:
try:
state.embedding_provider = OpenAIEmbeddingProvider(
api_key=api_key,
model="text-embedding-3-small",
dimension=vector_size
)
logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
return
except Exception as e:
logger.warning("Failed to initialize OpenAI provider, trying local model: %s", str(e))

# Try local fastembed
try:
state.embedding_provider = FastEmbedProvider(dimension=vector_size)
logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
return
except Exception as e:
logger.warning("Failed to initialize fastembed provider, using placeholder: %s", str(e))

# Fallback to placeholder
state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
logger.warning(
"Using placeholder embeddings (no semantic search). "
"Install fastembed or set OPENAI_API_KEY for semantic embeddings."
)
logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
return

# Invalid config
raise ValueError(
f"Invalid EMBEDDING_PROVIDER={provider_config}. "
f"Valid options: auto, openai, local, placeholder"
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make backends lazily imported per-branch and harden config parsing.

Avoid importing optional deps unless selected; minor robustness tweaks.

 def init_embedding_provider() -> None:
@@
-    provider_config = os.getenv("EMBEDDING_PROVIDER", "auto").lower()
+    provider_config = (os.getenv("EMBEDDING_PROVIDER", "auto") or "auto").strip().lower()
@@
     if provider_config == "openai":
         api_key = os.getenv("OPENAI_API_KEY")
         if not api_key:
             raise RuntimeError("EMBEDDING_PROVIDER=openai but OPENAI_API_KEY not set")
         try:
-            state.embedding_provider = OpenAIEmbeddingProvider(
+            from automem.embedding.openai import OpenAIEmbeddingProvider
+            state.embedding_provider = OpenAIEmbeddingProvider(
                 api_key=api_key,
                 model="text-embedding-3-small",
                 dimension=vector_size
             )
@@
     elif provider_config == "local":
         try:
-            state.embedding_provider = FastEmbedProvider(dimension=vector_size)
+            from automem.embedding.fastembed import FastEmbedProvider
+            state.embedding_provider = FastEmbedProvider(dimension=vector_size)
             logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
             return
         except Exception as e:
             raise RuntimeError(f"Failed to initialize local fastembed provider: {e}") from e
@@
     elif provider_config == "placeholder":
-        state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
+        from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+        state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
         logger.info("Embedding provider: %s", state.embedding_provider.provider_name())
         return
@@
     if provider_config == "auto":
         # Try OpenAI first
         api_key = os.getenv("OPENAI_API_KEY")
         if api_key:
             try:
-                state.embedding_provider = OpenAIEmbeddingProvider(
+                from automem.embedding.openai import OpenAIEmbeddingProvider
+                state.embedding_provider = OpenAIEmbeddingProvider(
                     api_key=api_key,
                     model="text-embedding-3-small",
                     dimension=vector_size
                 )
                 logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
                 return
             except Exception as e:
                 logger.warning("Failed to initialize OpenAI provider, trying local model: %s", str(e))
 
         # Try local fastembed
         try:
-            state.embedding_provider = FastEmbedProvider(dimension=vector_size)
+            from automem.embedding.fastembed import FastEmbedProvider
+            state.embedding_provider = FastEmbedProvider(dimension=vector_size)
             logger.info("Embedding provider (auto-selected): %s", state.embedding_provider.provider_name())
             return
         except Exception as e:
             logger.warning("Failed to initialize fastembed provider, using placeholder: %s", str(e))
 
         # Fallback to placeholder
-        state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
+        from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+        state.embedding_provider = PlaceholderEmbeddingProvider(dimension=vector_size)
🧰 Tools
🪛 Ruff (0.14.0)

1455-1455: Avoid specifying long messages outside the exception class

(TRY003)


1463-1463: Consider moving this statement to an else block

(TRY300)


1465-1465: Avoid specifying long messages outside the exception class

(TRY003)


1471-1471: Consider moving this statement to an else block

(TRY300)


1473-1473: Avoid specifying long messages outside the exception class

(TRY003)


1492-1492: Consider moving this statement to an else block

(TRY300)


1493-1493: Do not catch blind exception: Exception

(BLE001)


1500-1500: Consider moving this statement to an else block

(TRY300)


1501-1501: Do not catch blind exception: Exception

(BLE001)


1514-1517: Avoid specifying long messages outside the exception class

(TRY003)

Comment on lines 9 to 19
from automem.embedding.provider import EmbeddingProvider
from automem.embedding.openai import OpenAIEmbeddingProvider
from automem.embedding.fastembed import FastEmbedProvider
from automem.embedding.placeholder import PlaceholderEmbeddingProvider

__all__ = [
"EmbeddingProvider",
"OpenAIEmbeddingProvider",
"FastEmbedProvider",
"PlaceholderEmbeddingProvider",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid importing optional backends at module import time (breaks offline installs).

Re-exporting OpenAI/FastEmbed here forces importing their packages even when unused. This contradicts “no OpenAI required” and can crash environments without those deps. Guard or defer imports.

Apply this safer pattern:

-from automem.embedding.provider import EmbeddingProvider
-from automem.embedding.openai import OpenAIEmbeddingProvider
-from automem.embedding.fastembed import FastEmbedProvider
-from automem.embedding.placeholder import PlaceholderEmbeddingProvider
+from .provider import EmbeddingProvider
+# Optional backends: guard imports to avoid hard dependencies at import time
+try:
+    from .openai import OpenAIEmbeddingProvider  # type: ignore
+except ImportError:
+    OpenAIEmbeddingProvider = None  # type: ignore[assignment]
+try:
+    from .fastembed import FastEmbedProvider  # type: ignore
+except ImportError:
+    FastEmbedProvider = None  # type: ignore[assignment]
+from .placeholder import PlaceholderEmbeddingProvider
@@
-__all__ = [
-    "EmbeddingProvider",
-    "OpenAIEmbeddingProvider",
-    "FastEmbedProvider",
-    "PlaceholderEmbeddingProvider",
-]
+__all__ = [
+    "EmbeddingProvider",
+    "FastEmbedProvider",
+    "OpenAIEmbeddingProvider",
+    "PlaceholderEmbeddingProvider",
+]

This also satisfies Ruff’s sorted all warning. As per coding guidelines

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from automem.embedding.provider import EmbeddingProvider
from automem.embedding.openai import OpenAIEmbeddingProvider
from automem.embedding.fastembed import FastEmbedProvider
from automem.embedding.placeholder import PlaceholderEmbeddingProvider
__all__ = [
"EmbeddingProvider",
"OpenAIEmbeddingProvider",
"FastEmbedProvider",
"PlaceholderEmbeddingProvider",
]
from .provider import EmbeddingProvider
# Optional backends: guard imports to avoid hard dependencies at import time
try:
from .openai import OpenAIEmbeddingProvider # type: ignore
except ImportError:
OpenAIEmbeddingProvider = None # type: ignore[assignment]
try:
from .fastembed import FastEmbedProvider # type: ignore
except ImportError:
FastEmbedProvider = None # type: ignore[assignment]
from .placeholder import PlaceholderEmbeddingProvider
__all__ = [
"EmbeddingProvider",
"FastEmbedProvider",
"OpenAIEmbeddingProvider",
"PlaceholderEmbeddingProvider",
]
🧰 Tools
🪛 Ruff (0.14.0)

14-19: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents
In automem/embedding/__init__.py around lines 9 to 19, the module currently
imports optional backends (OpenAIEmbeddingProvider, FastEmbedProvider) at import
time which breaks offline installs; change to lazy/guarded exports by removing
top-level imports for optional providers, implement a __getattr__ that performs
local imports on first attribute access (raising AttributeError if missing), and
ensure __all__ is a sorted list of public names; this defers importing optional
dependencies until actually used and resolves the Ruff sorted-__all__ warning.

Comment on lines 55 to 59
# Set cache directory
if cache_dir is None:
cache_dir = Path.home() / ".config" / "automem" / "models"
cache_dir.mkdir(parents=True, exist_ok=True)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Derive actual embedding dimension and reconcile with configured value

If model_name and dimension disagree, provider.dimension() will be wrong, causing vector-store mismatches. Detect the real size after init and align (or fail fast). Also allow overriding cache_dir via env var for container/home portability.

-        # Set cache directory
-        if cache_dir is None:
-            cache_dir = Path.home() / ".config" / "automem" / "models"
+        # Set cache directory; allow env override for portability across users/containers
+        if cache_dir is None:
+            env_dir = os.getenv("AUTOMEM_MODELS_DIR")
+            cache_dir = Path(env_dir) if env_dir else (Path.home() / ".config" / "automem" / "models")
         cache_dir.mkdir(parents=True, exist_ok=True)
@@
-        self.model_name = model_name
-        self._dimension = dimension
+        self.model_name = model_name
+        # Derive actual embedding dimension to avoid mismatches
+        try:
+            _probe = next(self.model.embed([" "] ))
+            actual_dim = len(_probe)
+        except Exception:
+            # Fallback: assume requested dimension if probe fails (keeps behavior unchanged)
+            actual_dim = dimension
+
+        if actual_dim != dimension:
+            logger.warning(
+                "fastembed actual dimension %d != configured %d for model %s. "
+                "Using actual dimension; ensure your VECTOR_SIZE/Qdrant collection matches.",
+                actual_dim, dimension, model_name
+            )
+        self._dimension = actual_dim

Add required import at top:

-import logging
+import logging
+import os

Also applies to: 80-94

Comment on lines 57 to 63
response = self.client.embeddings.create(
input=text, model=self.model, dimensions=self._dimension
)
embedding = response.data[0].embedding
logger.debug("Generated OpenAI embedding for text (length: %d)", len(text))
return embedding

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Enforce returned vector length equals configured dimension

Protect against silent mismatches (e.g., model ignores dimensions or config drift) by validating the returned length matches self._dimension and failing fast with a clear error. Prevents downstream Qdrant insert errors.

         response = self.client.embeddings.create(
             input=text, model=self.model, dimensions=self._dimension
         )
-        embedding = response.data[0].embedding
+        embedding = response.data[0].embedding
+        if len(embedding) != self._dimension:
+            raise ValueError(
+                f"OpenAI embedding length {len(embedding)} != configured dimension {self._dimension} "
+                f"(model={self.model}). Ensure 'dimensions' is supported and vector store size matches."
+            )
         logger.debug("Generated OpenAI embedding for text (length: %d)", len(text))
         return embedding
@@
         response = self.client.embeddings.create(
             input=texts, model=self.model, dimensions=self._dimension
         )
-        embeddings = [item.embedding for item in response.data]
+        embeddings = [item.embedding for item in response.data]
+        bad = next((i for i, e in enumerate(embeddings) if len(e) != self._dimension), None)
+        if bad is not None:
+            raise ValueError(
+                f"OpenAI batch embedding length {len(embeddings[bad])} != configured dimension {self._dimension} "
+                f"at index {bad} (model={self.model})."
+            )
         logger.info(

Also applies to: 79-88

🤖 Prompt for AI Agents
In automem/embedding/openai.py around lines 57 to 63 (and similarly for lines
79–88), the code currently returns the embedding without checking its length;
add a validation step after extracting response.data[0].embedding that compares
len(embedding) to self._dimension and if they differ raise a ValueError (or
custom exception) with a clear message including expected and actual lengths,
and log the mismatch before raising so callers fail fast and avoid downstream
Qdrant insert errors.

Implements three-tier embedding provider pattern with automatic fallback:
1. OpenAI API (text-embedding-3-small) - if OPENAI_API_KEY set
2. FastEmbed local model (BAAI/bge-base-en-v1.5) - no API key needed
3. Placeholder hash-based embeddings - last resort fallback

Key changes:
- Create automem/embedding/ module with provider abstraction
  - provider.py: EmbeddingProvider base class
  - openai.py: OpenAI API provider (refactored from existing code)
  - fastembed.py: Local ONNX model provider (~210MB, 768-dim)
  - placeholder.py: Hash-based fallback (refactored)
- Update app.py to use provider pattern with init_embedding_provider()
- Add fastembed==0.4.2 dependency to requirements.txt
- Add Docker volume for persistent model cache in docker-compose.yml
- Document new EMBEDDING_PROVIDER configuration in CLAUDE.md and docs/
- Fix test compatibility issues (PayloadSchemaType import, cache invalidation)
- Improve entity extraction regex patterns for project names

Configuration:
- EMBEDDING_PROVIDER=auto (default): Try OpenAI → FastEmbed → Placeholder
- EMBEDDING_PROVIDER=openai: Use OpenAI only
- EMBEDDING_PROVIDER=local: Use FastEmbed only
- EMBEDDING_PROVIDER=placeholder: Use placeholder only

Benefits:
- Semantic search without API keys or internet (after initial download)
- Cost savings (no OpenAI API costs)
- 768-dim model compatible with existing Qdrant collections
- Backward compatible with existing OpenAI deployments
- Follows proven Codanna architecture pattern

Tests: All passing (53 passed, 9 skipped)
@andrewleech andrewleech force-pushed the feat/local-embedding-provider branch from bad9679 to 2998270 Compare October 17, 2025 05:05
…er provider

- Added B311 nosec comment to explain deterministic RNG is intentional
- This addresses the final CodeRabbit code quality recommendation
- Placeholder embeddings need reproducible vectors, not cryptographic randomness
- Test PlaceholderEmbeddingProvider determinism and dimensions
- Test FastEmbedProvider with mocked model initialization
- Test OpenAIEmbeddingProvider with mocked API client
- Test provider selection logic (auto, explicit, fallback)
- Test integration with app.py initialization
- Test error handling and dimension mismatches
- Test batch processing and provider features
- Add end-to-end test for memory storage with providers

All 33 new tests pass, bringing total to 86 passed + 9 skipped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants