fix: auto-detect OpenAI embedding provider when API key available (#72)#583
fix: auto-detect OpenAI embedding provider when API key available (#72)#583
Conversation
When OPENAI_API_KEY is set and VALENCE_EMBEDDING_PROVIDER has not been
explicitly configured, the embedding provider now auto-selects 'openai'
instead of falling back to the 'local' default (which requires a local
model that is typically not set up).
The batch Python API path was previously broken in environments where
only an OpenAI API key is available, because the MCP/server path
benefits from env-var injection at startup while direct API usage
went straight to CoreSettings with its 'local' default.
Changes:
- Add model_validator(mode='after') _auto_select_embedding_provider to
CoreSettings in src/valence/core/config.py
- Import model_validator from pydantic
- Add TestEmbeddingProviderAutoDetect test class in
tests/core/test_config.py covering three scenarios:
1. OPENAI_API_KEY set + no VALENCE_EMBEDDING_PROVIDER → 'openai'
2. VALENCE_EMBEDDING_PROVIDER='local' explicitly set → stays 'local'
3. No key, no explicit provider → stays 'local'
Closes ourochronos/tracking#72
There was a problem hiding this comment.
Pull request overview
This PR updates Valence’s configuration so CoreSettings.embedding_provider auto-selects "openai" when OPENAI_API_KEY is present and the provider was not explicitly configured, preventing unexpected failures when using the Python API with default settings.
Changes:
- Added a
@model_validator(mode="after")inCoreSettingsto auto-select the OpenAI embedding provider when an API key is available. - Added unit tests covering env-var-driven auto-detection and explicit env override behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/valence/core/config.py |
Adds an “after” model validator to auto-select embedding_provider="openai" when OPENAI_API_KEY is set. |
tests/core/test_config.py |
Adds tests validating auto-detection and respecting explicit env configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| settings = CoreSettings() | ||
| assert settings.embedding_provider == "local" | ||
|
|
There was a problem hiding this comment.
These tests cover env-var-based overrides, but don’t cover the common direct-Python usage where a caller might explicitly pass embedding_provider via CoreSettings(embedding_provider=...). Given the validator logic, adding a test that an explicit embedding_provider="local" (via kwargs) is not auto-overridden when OPENAI_API_KEY is set would help prevent regressions.
| def test_kwarg_local_override_respected(self, monkeypatch, clean_env): | |
| """When embedding_provider='local' is passed via kwargs, it should stay | |
| 'local' even if OPENAI_API_KEY is set (direct Python override).""" | |
| monkeypatch.setenv("OPENAI_API_KEY", "sk-test-kwarg-local") | |
| monkeypatch.delenv("VALENCE_EMBEDDING_PROVIDER", raising=False) | |
| settings = CoreSettings(embedding_provider="local") | |
| assert settings.embedding_provider == "local" |
| import os | ||
|
|
||
| # Check if VALENCE_EMBEDDING_PROVIDER was explicitly set | ||
| if not os.environ.get("VALENCE_EMBEDDING_PROVIDER"): |
There was a problem hiding this comment.
The explicit-configuration check relies on os.environ.get("VALENCE_EMBEDDING_PROVIDER"), but BaseSettings values can come from non-os.environ sources (e.g., passing embedding_provider="local" to CoreSettings(...), or values loaded from an env_file in non-test runs). In those cases this validator can incorrectly override an explicit local setting to openai. Prefer detecting whether the field was provided via settings inputs (e.g., using self.model_fields_set to see if embedding_provider was set) rather than inspecting os.environ directly.
| import os | |
| # Check if VALENCE_EMBEDDING_PROVIDER was explicitly set | |
| if not os.environ.get("VALENCE_EMBEDDING_PROVIDER"): | |
| # Only auto-select when the embedding_provider field was *not* | |
| # explicitly provided via any settings source. | |
| if "embedding_provider" not in self.model_fields_set: |
| self.embedding_provider = "openai" | ||
| return self | ||
|
|
||
| # ========================================================================== |
There was a problem hiding this comment.
There’s a section divider comment indented under the validator after return self, which is unreachable and makes the file structure confusing. Consider removing it or unindenting it so the section headers align consistently with the rest of the module.
| # ========================================================================== | |
| # ========================================================================== |
Problem
The
embedding_providerconfig defaults to"local"but local embeddings aren't set up in most environments. WhenOPENAI_API_KEYis available, the provider should auto-select"openai". The MCP/server path worked because env vars were set at server startup, but direct Python API usage defaulted to"local"— causing failures in batch usage.Fix
Added a Pydantic
model_validator(mode='after')toCoreSettingsinsrc/valence/core/config.py:embedding_provideris still the default"local"andOPENAI_API_KEYis set andVALENCE_EMBEDDING_PROVIDERwas not explicitly configured → auto-select"openai"VALENCE_EMBEDDING_PROVIDERis always respected (no surprise overrides)Tests
Added
TestEmbeddingProviderAutoDetectintests/core/test_config.py:OPENAI_API_KEYset + noVALENCE_EMBEDDING_PROVIDER→ provider becomes"openai"VALENCE_EMBEDDING_PROVIDER=localexplicit +OPENAI_API_KEYset → stays"local"(user override)"local"All 1716 tests pass (10 skipped).
Closes ourochronos/tracking#72