Skip to content
This repository was archived by the owner on Mar 2, 2026. It is now read-only.

fix: auto-detect OpenAI embedding provider when API key available (#72)#583

Merged
zonk1024 merged 1 commit intomainfrom
fix/72-embedding-provider-autodetect
Feb 28, 2026
Merged

fix: auto-detect OpenAI embedding provider when API key available (#72)#583
zonk1024 merged 1 commit intomainfrom
fix/72-embedding-provider-autodetect

Conversation

@zonk1024
Copy link
Contributor

Problem

The embedding_provider config defaults to "local" but local embeddings aren't set up in most environments. When OPENAI_API_KEY is available, the provider should auto-select "openai". The MCP/server path worked because env vars were set at server startup, but direct Python API usage defaulted to "local" — causing failures in batch usage.

Fix

Added a Pydantic model_validator(mode='after') to CoreSettings in src/valence/core/config.py:

  • If embedding_provider is still the default "local" and OPENAI_API_KEY is set and VALENCE_EMBEDDING_PROVIDER was not explicitly configured → auto-select "openai"
  • Explicit user configuration via VALENCE_EMBEDDING_PROVIDER is always respected (no surprise overrides)

Tests

Added TestEmbeddingProviderAutoDetect in tests/core/test_config.py:

  1. OPENAI_API_KEY set + no VALENCE_EMBEDDING_PROVIDER → provider becomes "openai"
  2. VALENCE_EMBEDDING_PROVIDER=local explicit + OPENAI_API_KEY set → stays "local" (user override)
  3. No key, no explicit provider → stays "local"

All 1716 tests pass (10 skipped).

Closes ourochronos/tracking#72

When OPENAI_API_KEY is set and VALENCE_EMBEDDING_PROVIDER has not been
explicitly configured, the embedding provider now auto-selects 'openai'
instead of falling back to the 'local' default (which requires a local
model that is typically not set up).

The batch Python API path was previously broken in environments where
only an OpenAI API key is available, because the MCP/server path
benefits from env-var injection at startup while direct API usage
went straight to CoreSettings with its 'local' default.

Changes:
- Add model_validator(mode='after') _auto_select_embedding_provider to
  CoreSettings in src/valence/core/config.py
- Import model_validator from pydantic
- Add TestEmbeddingProviderAutoDetect test class in
  tests/core/test_config.py covering three scenarios:
    1. OPENAI_API_KEY set + no VALENCE_EMBEDDING_PROVIDER → 'openai'
    2. VALENCE_EMBEDDING_PROVIDER='local' explicitly set → stays 'local'
    3. No key, no explicit provider → stays 'local'

Closes ourochronos/tracking#72
Copilot AI review requested due to automatic review settings February 28, 2026 19:45
@zonk1024 zonk1024 merged commit b3d21d1 into main Feb 28, 2026
7 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Valence’s configuration so CoreSettings.embedding_provider auto-selects "openai" when OPENAI_API_KEY is present and the provider was not explicitly configured, preventing unexpected failures when using the Python API with default settings.

Changes:

  • Added a @model_validator(mode="after") in CoreSettings to auto-select the OpenAI embedding provider when an API key is available.
  • Added unit tests covering env-var-driven auto-detection and explicit env override behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/valence/core/config.py Adds an “after” model validator to auto-select embedding_provider="openai" when OPENAI_API_KEY is set.
tests/core/test_config.py Adds tests validating auto-detection and respecting explicit env configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


settings = CoreSettings()
assert settings.embedding_provider == "local"

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests cover env-var-based overrides, but don’t cover the common direct-Python usage where a caller might explicitly pass embedding_provider via CoreSettings(embedding_provider=...). Given the validator logic, adding a test that an explicit embedding_provider="local" (via kwargs) is not auto-overridden when OPENAI_API_KEY is set would help prevent regressions.

Suggested change
def test_kwarg_local_override_respected(self, monkeypatch, clean_env):
"""When embedding_provider='local' is passed via kwargs, it should stay
'local' even if OPENAI_API_KEY is set (direct Python override)."""
monkeypatch.setenv("OPENAI_API_KEY", "sk-test-kwarg-local")
monkeypatch.delenv("VALENCE_EMBEDDING_PROVIDER", raising=False)
settings = CoreSettings(embedding_provider="local")
assert settings.embedding_provider == "local"

Copilot uses AI. Check for mistakes.
Comment on lines +241 to +244
import os

# Check if VALENCE_EMBEDDING_PROVIDER was explicitly set
if not os.environ.get("VALENCE_EMBEDDING_PROVIDER"):
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit-configuration check relies on os.environ.get("VALENCE_EMBEDDING_PROVIDER"), but BaseSettings values can come from non-os.environ sources (e.g., passing embedding_provider="local" to CoreSettings(...), or values loaded from an env_file in non-test runs). In those cases this validator can incorrectly override an explicit local setting to openai. Prefer detecting whether the field was provided via settings inputs (e.g., using self.model_fields_set to see if embedding_provider was set) rather than inspecting os.environ directly.

Suggested change
import os
# Check if VALENCE_EMBEDDING_PROVIDER was explicitly set
if not os.environ.get("VALENCE_EMBEDDING_PROVIDER"):
# Only auto-select when the embedding_provider field was *not*
# explicitly provided via any settings source.
if "embedding_provider" not in self.model_fields_set:

Copilot uses AI. Check for mistakes.
self.embedding_provider = "openai"
return self

# ==========================================================================
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a section divider comment indented under the validator after return self, which is unreachable and makes the file structure confusing. Consider removing it or unindenting it so the section headers align consistently with the rest of the module.

Suggested change
# ==========================================================================
# ==========================================================================

Copilot uses AI. Check for mistakes.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants