Releases: BetterDB-inc/monitor
Semantic Cache v0.2.0
Release Notes — @betterdb/semantic-cache v0.2.0
v0.1.0 shipped the core cache with text-only string prompts and two adapters.
v0.2.0 adds five new adapters, five embedding helpers, and a set of features
that make the cache production-ready: cost tracking, multi-modal prompts, batch
lookup, threshold tuning, embedding cache, stale-model eviction, and a rerank hook.
Installation
npm install @betterdb/semantic-cache@0.2.0 iovalkeyNew adapters
v0.1.0 had LangChain and Vercel AI SDK. v0.2.0 adds:
OpenAI Chat Completions — @betterdb/semantic-cache/openai
Extracts the last user message from ChatCompletionCreateParams. Handles text,
image_url (URL and base64), input_audio, and file content parts.
import { prepareSemanticParams } from '@betterdb/semantic-cache/openai';
const { text, blocks, model } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);
if (!result.hit) {
const response = await openai.chat.completions.create(params);
await cache.store(blocks ?? text, response.choices[0].message.content!, { model });
}OpenAI Responses API — @betterdb/semantic-cache/openai-responses
Extracts the last user input from the Responses API input field — string or
message array with input_text, input_image, and input_file parts.
import { prepareSemanticParams } from '@betterdb/semantic-cache/openai-responses';
const { text, blocks } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);Anthropic Messages — @betterdb/semantic-cache/anthropic
Extracts the last user message from MessageCreateParamsNonStreaming. Supports
text; base64, URL, and file images; and base64, URL, plaintext, and file documents.
import { prepareSemanticParams } from '@betterdb/semantic-cache/anthropic';
const { text, blocks, model } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);LlamaIndex — @betterdb/semantic-cache/llamaindex
Extracts the last user ChatMessage from a ChatMessage[] array. Supports
text, image_url, file, audio, and image content parts.
import { prepareSemanticParams } from '@betterdb/semantic-cache/llamaindex';
const { text } = await prepareSemanticParams(messages, { model: 'gpt-4o' });
const result = await cache.check(text);LangGraph semantic memory store — @betterdb/semantic-cache/langgraph
BetterDBSemanticStore implements the LangGraph BaseStore interface using
vector similarity — find the most semantically relevant past observations for a
given query. This is distinct from @betterdb/agent-cache/langgraph, which does
exact-match checkpoint persistence. Both can coexist on the same Valkey instance
with different key prefixes.
import { BetterDBSemanticStore } from '@betterdb/semantic-cache/langgraph';
const store = new BetterDBSemanticStore({ cache, embedField: 'content' });
await store.put(['user', 'alice', 'facts'], 'pref_001', {
content: 'Alice prefers async Python over synchronous code.',
});
const results = await store.search(['user', 'alice', 'facts'], {
query: "What are Alice's coding preferences?",
limit: 5,
});
// results[i].value — the stored object; results[i].key — the item keyFull interface: put(), get(), search() (semantic KNN or namespace scan),
delete(), batch().
Updated: LangChain — @betterdb/semantic-cache/langchain
BetterDBSemanticCache now wraps responses in a proper AIMessage so chat
models can correctly access response.content. New filterByModel option scopes
hits to a specific LLM configuration (deterministically hashed from llm_string).
Embedding helpers
Five pre-built EmbedFn factories so you don't have to write your own:
| Import | Provider | Default model | Dimensions |
|---|---|---|---|
@betterdb/semantic-cache/embed/openai |
OpenAI | text-embedding-3-small |
1536 |
@betterdb/semantic-cache/embed/voyage |
Voyage AI | voyage-3-lite |
512 |
@betterdb/semantic-cache/embed/cohere |
Cohere | embed-english-v3.0 |
1024 |
@betterdb/semantic-cache/embed/ollama |
Ollama (local) | nomic-embed-text |
768 |
@betterdb/semantic-cache/embed/bedrock |
AWS Bedrock | amazon.titan-embed-text-v2:0 |
1024 |
import { createOpenAIEmbed } from '@betterdb/semantic-cache/embed/openai';
import { createVoyageEmbed } from '@betterdb/semantic-cache/embed/voyage';
import { createOllamaEmbed } from '@betterdb/semantic-cache/embed/ollama';
import { createBedrockEmbed } from '@betterdb/semantic-cache/embed/bedrock';
const cache = new SemanticCache({
client,
embedFn: createVoyageEmbed({ model: 'voyage-3-lite' }),
});All helpers lazily initialise their clients and cache the instance across calls —
no per-request connection overhead.
New core features
Cost tracking + bundled model price table
Store token counts at cache time; get automatic cost-saved reporting on every hit.
A bundled DEFAULT_COST_TABLE covers 1,971 models sourced from
LiteLLM
and is refreshed on every release via pnpm update:pricing. No configuration is
required for common models.
await cache.store('Summarize this document', responseText, {
model: 'gpt-4o',
inputTokens: 512,
outputTokens: 128,
});
const result = await cache.check('Summarize this document');
console.log(result.costSaved); // e.g. 0.00385 — dollars saved on this hit
const stats = await cache.stats();
console.log(stats.costSavedMicros); // cumulative across all hitsOverride or extend the table via costTable; disable it with
useDefaultCostTable: false. DEFAULT_COST_TABLE and ModelCost are exported
from the package root.
Multi-modal prompts
check(), store(), and the new storeMultipart() accept string | ContentBlock[].
A ContentBlock[] prompt embeds the text blocks and uses binary refs as an AND-filter —
a hit requires both the text to be semantically similar and all binary refs to
match exactly.
import { hashBase64, type ContentBlock } from '@betterdb/semantic-cache';
const prompt: ContentBlock[] = [
{ type: 'text', text: 'What is in this image?' },
{ type: 'binary', kind: 'image', mediaType: 'image/png', ref: hashBase64(b64) },
];
await cache.store(prompt, 'A red square on a white background.');
const result = await cache.check(prompt); // hit requires both text AND same imagestoreMultipart(prompt, blocks[]) stores a structured response (text +
citations + tool calls) and returns result.contentBlocks on hit.
Binary normalizer: composeNormalizer, hashBase64, hashBytes, hashUrl,
and fetchAndHash produce stable, compact refs for any binary source. The
defaultNormalizer hashes base64 and bytes rather than storing raw data in TAG
fields. All normalizer utilities are exported from the package root.
Embedding cache
Computed embedding vectors are stored in Valkey ({name}:embed:{sha256}) and
reused on subsequent check() calls for the same text — embedFn is only
called once per unique string, then reads hit a fast GET.
// enabled by default; override if needed:
new SemanticCache({
embeddingCache: { enabled: true, ttl: 86400 },
});New Prometheus counter: {prefix}_embedding_cache_total labelled
result: hit | miss.
Batch check — checkBatch()
Embeds all prompts in parallel and pipelines all FT.SEARCH calls in a single
Valkey round-trip.
const results = await cache.checkBatch([
'What is the capital of France?',
'Who wrote Hamlet?',
'What is the speed of light?',
]);
// results[i] is a CacheCheckResult — same shape as check()Typically 60–80% faster than sequential check() calls for bulk lookups,
dashboards, and prefetch patterns.
Rerank hook
Retrieve the top-k most similar candidates and apply custom ranking before
deciding whether to serve from cache — useful for cross-encoder reranking,
LLM-as-judge, or domain-specific scoring.
const result = await cache.check(query, {
rerank: {
k: 5,
rerankFn: async (query, candidates) => {
const scores = await crossEncoder.predict(query, candidates.map(c => c.response));
const best = scores.indexOf(Math.max(...scores));
return scores[best] > 0.8 ? best : -1; // -1 → reject all → miss
},
},
});Stale-model eviction
Automatically evict cached entries when you upgrade the LLM for a prompt category.
On a hit, if the stored model differs from currentModel, the entry is deleted and
the call returns a miss — forcing a fresh response under the new model.
const result = await cache.check(prompt, {
staleAfterModelChange: true,
currentModel: 'gpt-4o', // evict if entry was stored with gpt-3.5-turbo
});New Prometheus counter: {prefix}_stale_model_evictions_total.
Threshold effectiveness recommendations
The cache records a rolling window of cosine distance scores (up to 10,000
entries, 7-day retention). thresholdEffectiveness() analyzes this window and
returns a concrete recommendation:
const analysis = await cache.thresholdEffectiveness({ minSamples: 100 });
// {
// recommendation: 'tighten_threshold',
// currentThreshold: 0.1,
// recommendedThreshold: 0.072,
// hitRate: 0.83,
// uncertainHitRate: 0.31,
// nearMissRate: 0.04,
// reasoning: '31.0% of hits are in the uncertainty band — tighten the threshold...',
// }
// Per-category + aggregate in one call:
const allResults = await cache.th...Semantic Cache Python v0.1.2
betterdb-semantic-cache v0.1.0
Python port of @betterdb/semantic-cache. Embeddings-based semantic cache for AI
workloads backed by Valkey vector search — similarity matching, cost tracking,
multi-modal prompts, embedding cache, and threshold tuning, with built-in
OpenTelemetry and Prometheus instrumentation.
Requires Valkey 8+ with the valkey-search module (vector index support).
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.
Installation
pip install betterdb-semantic-cacheOptional extras install the provider SDKs alongside the library:
pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]" # voyage / cohere / ollama embed helpers
pip install "betterdb-semantic-cache[bedrock]" # AWS Bedrock embed helperWhat's included
SemanticCache
| Method | Description |
|---|---|
initialize() |
Create or attach to the vector index |
check(prompt) |
Similarity lookup — returns hit/miss with confidence and optional cost saved |
store(prompt, response) |
Store a response with optional cost metadata |
store_multipart(prompt, blocks) |
Store structured content blocks |
check_batch(prompts) |
Pipelined batch lookup |
invalidate(filter) |
Delete entries matching a FT.SEARCH filter |
invalidate_by_model(model) |
Delete all entries for a model |
invalidate_by_category(category) |
Delete all entries in a category |
stats() |
Hit/miss counts and cumulative cost saved |
index_info() |
Index name, doc count, vector dimension |
threshold_effectiveness() |
Rolling window analysis and threshold recommendations |
threshold_effectiveness_all() |
Per-category analysis |
flush() |
Drop index and delete all cached entries |
Provider adapters
| Import | Provider |
|---|---|
betterdb_semantic_cache.adapters.openai |
OpenAI Chat Completions |
betterdb_semantic_cache.adapters.openai_responses |
OpenAI Responses API |
betterdb_semantic_cache.adapters.anthropic |
Anthropic Messages |
betterdb_semantic_cache.adapters.llamaindex |
LlamaIndex ChatMessage[] |
betterdb_semantic_cache.adapters.langchain |
LangChain BaseCache (async-only) |
betterdb_semantic_cache.adapters.langgraph |
LangGraph BetterDBSemanticStore |
Embedding helpers
| Import | Provider |
|---|---|
embed.openai |
OpenAI Embeddings API |
embed.voyage |
Voyage AI (httpx, no SDK required) |
embed.cohere |
Cohere Embed v3 (httpx, no SDK required) |
embed.ollama |
Ollama local models (httpx, no SDK required) |
embed.bedrock |
AWS Bedrock Titan / Cohere (boto3) |
Bundled default cost table
A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled and refreshed on every release. Cost savings tracking works out of the
box for 1,900+ models — no cost_table configuration required.
Observability
- OpenTelemetry spans on every cache operation
- Prometheus metrics:
requests_total,similarity_score,operation_duration_seconds,
embedding_duration_seconds,cost_saved_total,embedding_cache_total,
stale_model_evictions_total
Cluster support
Pass a ValkeyCluster client and all SCAN-based operations (flush,
invalidate_by_model, invalidate_by_category) automatically iterate all master nodes.
Quick start
import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.types import CacheStoreOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed
client = valkey.Valkey(host="localhost", port=6379)
cache = SemanticCache(SemanticCacheOptions(
client=client,
embed_fn=create_openai_embed(),
default_threshold=0.12,
))
async def main():
await cache.initialize()
result = await cache.check("What is the capital of France?")
if result.hit:
print("Cache hit:", result.response)
else:
answer = "Paris" # ... call your LLM ...
await cache.store(
"What is the capital of France?", answer,
CacheStoreOptions(model="gpt-4o", input_tokens=20, output_tokens=5),
)
asyncio.run(main())Full changelog
See CHANGELOG.md for detailed history.
Semantic cache python 0.1.1
Updated release flow
betterdb-semantic-cache v0.1.0
betterdb-semantic-cache v0.1.0
Initial release. Full Python port of @betterdb/semantic-cache v0.2.0 — async-first,
dataclass config, feature-for-feature parity with the TypeScript implementation.
Requires Python 3.11+, Valkey 8+ with the valkey-search module.
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.
Installation
pip install betterdb-semantic-cacheInstall optional extras alongside the library:
pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]" # Voyage AI, Cohere, Ollama
pip install "betterdb-semantic-cache[bedrock]" # AWS Bedrock
pip install "betterdb-semantic-cache[all]" # everything aboveAdapters
Six adapters extract the semantic cache key from provider-specific request objects.
All return a SemanticParams dataclass with text, blocks, and model fields.
OpenAI Chat Completions
from betterdb_semantic_cache.adapters.openai import prepare_semantic_params
params = {
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
}
sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)
if not result.hit:
response = await openai_client.chat.completions.create(**params)
await cache.store(sp.blocks or sp.text, response.choices[0].message.content,
CacheStoreOptions(model=sp.model))Handles text, image_url (URL and base64), input_audio, and file content parts.
Pass normalizer=cache.normalizer to share the same normalization strategy.
OpenAI Responses API
from betterdb_semantic_cache.adapters.openai_responses import prepare_semantic_params
sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)Supports input_text, input_image, and input_file content parts.
Anthropic Messages
from betterdb_semantic_cache.adapters.anthropic import prepare_semantic_params
sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)Supports text; base64, URL, and file images; base64, URL, plaintext, and file documents.
LlamaIndex
from betterdb_semantic_cache.adapters.llamaindex import prepare_semantic_params
sp = await prepare_semantic_params(messages, model="gpt-4o")
result = await cache.check(sp.text)Extracts the last user ChatMessage from a list. Supports text, image_url,
file, audio, and image content parts.
LangChain — async BaseCache
BetterDBSemanticCache implements LangChain's BaseCache interface. Because
SemanticCache is async-only, the synchronous lookup() / update() methods
return None / no-op; use ainvoke / astream to get real cache behaviour.
from betterdb_semantic_cache.adapters.langchain import BetterDBSemanticCache
from langchain_openai import ChatOpenAI
lc_cache = BetterDBSemanticCache(cache)
llm = ChatOpenAI(model="gpt-4o", cache=lc_cache)
# Cache is transparent — hits are returned without calling the LLM
response = await llm.ainvoke("What is the capital of France?")Optional filter_by_model=True scopes hits to a specific LLM configuration.
LangGraph semantic memory store
BetterDBSemanticStore implements the LangGraph BaseStore interface using
vector similarity for retrieval. Use this for agent memory (finding the most
relevant past facts for a query), not for checkpoint persistence — use
betterdb_agent_cache.adapters.langgraph for that. Both can coexist on the
same Valkey instance with different key prefixes.
from betterdb_semantic_cache.adapters.langgraph import BetterDBSemanticStore
store = BetterDBSemanticStore(cache, embed_field="content")
await store.aput(["user", "alice", "facts"], "pref_001", {
"content": "Alice prefers async Python over synchronous code.",
})
results = await store.asearch(["user", "alice", "facts"],
query="What are Alice's coding preferences?",
limit=5)
# results[i].value — the stored dict; results[i].key — the item keyFull interface: aput(), aget(), asearch() (semantic KNN or namespace scan),
adelete(), abatch().
Embedding helpers
Five pre-built EmbedFn callables so you don't have to write your own:
| Import | Provider | Default model | Dimensions |
|---|---|---|---|
betterdb_semantic_cache.embed.openai |
OpenAI | text-embedding-3-small |
1536 |
betterdb_semantic_cache.embed.voyage |
Voyage AI | voyage-3-lite |
512 |
betterdb_semantic_cache.embed.cohere |
Cohere | embed-english-v3.0 |
1024 |
betterdb_semantic_cache.embed.ollama |
Ollama (local) | nomic-embed-text |
768 |
betterdb_semantic_cache.embed.bedrock |
AWS Bedrock | amazon.titan-embed-text-v2:0 |
1024 |
from betterdb_semantic_cache.embed.openai import create_openai_embed
from betterdb_semantic_cache.embed.voyage import create_voyage_embed
from betterdb_semantic_cache.embed.ollama import create_ollama_embed
cache = SemanticCache(SemanticCacheOptions(
client=client,
embed_fn=create_voyage_embed(model="voyage-3-lite"),
))The Voyage AI, Cohere, and Ollama helpers use httpx directly — no provider SDK
required. The httpx client is created once per helper instance and reused across
calls. Install: pip install "betterdb-semantic-cache[httpx]".
Core features
Cost tracking + bundled model price table
Store token counts at cache time; get automatic cost-saved reporting on every hit.
A bundled DEFAULT_COST_TABLE covers 1,900+ models from
LiteLLM
and is refreshed on every release. No configuration required for common models.
await cache.store("Summarize this document", response_text,
CacheStoreOptions(model="gpt-4o", input_tokens=512, output_tokens=128))
result = await cache.check("Summarize this document")
print(result.cost_saved) # e.g. 0.00385 — dollars saved on this hit
stats = await cache.stats()
print(stats.cost_saved_micros) # cumulative across all hitsOverride entries with cost_table={...}; disable with use_default_cost_table=False.
Multi-modal prompts
check(), store(), and store_multipart() accept str | list[ContentBlock].
A ContentBlock list embeds the text blocks and uses binary refs as an AND-filter —
a hit requires both semantic similarity on the text and all binary refs to match.
from betterdb_semantic_cache.normalizer import hash_base64
from betterdb_semantic_cache.utils import TextBlock, BinaryBlock
prompt = [
TextBlock(type="text", text="What is in this image?"),
BinaryBlock(type="binary", kind="image", mediaType="image/png",
ref=hash_base64(b64_data)),
]
await cache.store_multipart(prompt, [TextBlock(type="text", text="A red square.")])
result = await cache.check(prompt) # hit requires both text match AND same image
if result.hit:
print(result.content_blocks) # the stored ContentBlock[] responseBinary normalizer: compose_normalizer, hash_base64, hash_bytes, hash_url,
and fetch_and_hash generate stable, compact refs for any binary source. The
default_normalizer hashes base64 and bytes rather than storing raw data in TAG
fields. Access the configured normalizer via cache.normalizer.
Embedding cache
Computed embedding vectors are stored in Valkey ({name}:embed:{sha256}) and
reused on subsequent check() calls for the same text — embed_fn is only
called once per unique string.
SemanticCache(SemanticCacheOptions(
...,
embedding_cache=EmbeddingCacheOptions(enabled=True, ttl=86400), # default
))Prometheus counter: {prefix}_embedding_cache_total labelled result: hit | miss.
Batch check — check_batch()
Embeds all prompts in parallel and pipelines all FT.SEARCH calls in a single
Valkey round-trip.
results = await cache.check_batch([
"What is the capital of France?",
"Who wrote Hamlet?",
"What is the speed of light?",
])
# results[i] is a CacheCheckResult — same shape as check()Rerank hook
Retrieve the top-k most similar candidates and apply custom ranking before
serving from cache.
async def pick_longest(_query: str, candidates: list[dict]) -> int:
return max(range(len(candidates)), key=lambda i: len(candidates[i]["response"]))
result = await cache.check(query, CacheCheckOptions(
rerank=RerankOptions(k=5, rerank_fn=pick_longest),
))Return -1 from rerank_fn to reject all candidates (miss).
Stale-model eviction
Automatically evict cached entries when you upgrade the LLM for a prompt category.
On a hit, if the stored model differs from current_model, the entry is deleted and
the call returns a miss.
result = await cache.check(prompt, CacheCheckOptions(
stale_after_model_change=True,
current_model="gpt-4o", # evict if entry was stored with gpt-3.5-turbo
))Prometheus counter: {prefix}_stale_model_evictions_total.
Threshold effectiveness recommendations
threshold_effectiveness() analyzes a rolling window of cosine distance scores
(up to 10,000 entries, 7-day retention) and returns a concrete recommendation:
analysis = await cache.threshold_effectiveness(min_samples=100)
# ThresholdEffectivenessResult:
# recommendation: ...Agent Cache Python v0.4.1
betterdb-agent-cache v0.4.0
Python port of @betterdb/agent-cache. Multi-tier exact-match cache for AI agent
workloads backed by Valkey — LLM responses, tool results, and session state, with
built-in OpenTelemetry and Prometheus instrumentation.
Runs on vanilla Valkey 7+. No modules, no RedisJSON, no RediSearch. Works on
ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.
Installation
pip install betterdb-agent-cacheOptional extras install the provider SDKs alongside the library:
pip install "betterdb-agent-cache[openai]"
pip install "betterdb-agent-cache[anthropic]"
pip install "betterdb-agent-cache[langchain]"
pip install "betterdb-agent-cache[langgraph]"
pip install "betterdb-agent-cache[llamaindex]"What's included
Three cache tiers
| Tier | Use for |
|---|---|
cache.llm |
LLM API responses — check / store / store_multipart / invalidate_by_model |
cache.tool |
Tool call results — check / store / set_policy / invalidate_by_tool |
cache.session |
Agent session state — get / set / get_all / destroy_thread / touch |
Provider adapters
| Import | Provider |
|---|---|
betterdb_agent_cache.adapters.openai |
OpenAI Chat Completions |
betterdb_agent_cache.adapters.openai_responses |
OpenAI Responses API |
betterdb_agent_cache.adapters.anthropic |
Anthropic Messages |
betterdb_agent_cache.adapters.llamaindex |
LlamaIndex |
betterdb_agent_cache.adapters.langchain |
LangChain BaseCache |
betterdb_agent_cache.adapters.langgraph |
LangGraph BaseCheckpointSaver |
Bundled default cost table
A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled with the package and refreshed on every release. Cost tracking works out of
the box for 1,900+ models — no cost_table configuration required.
User-supplied cost_table entries are merged on top of the defaults, so you can
override a single model without losing coverage for everything else:
cache = AgentCache(AgentCacheOptions(
client=client,
cost_table={"gpt-4o": ModelCost(input_per_1k=0.002, output_per_1k=0.008)},
))To disable the default table entirely:
cache = AgentCache(AgentCacheOptions(
client=client,
use_default_cost_table=False,
cost_table={...},
))The bundled table is also exported directly if you need to inspect it:
from betterdb_agent_cache import DEFAULT_COST_TABLEPluggable binary normalizer
Controls how binary content (images, audio, documents) is reduced to a stable string
before hashing. Zero-latency by default — no network calls.
from betterdb_agent_cache import compose_normalizer, hash_base64
normalizer = compose_normalizer({"base64": hash_base64})Observability
- OpenTelemetry spans on every cache operation
- Prometheus counters, histograms, and gauges:
requests_total,operation_duration_seconds,
cost_saved_total,stored_bytes_total,active_sessions
Cluster support
Pass a ValkeyCluster client and all SCAN-based operations (flush, invalidate_by_model,
invalidate_by_tool, destroy_thread, touch) automatically iterate all master nodes.
Quick start
Cost tracking is pre-defined for 1,900+ models — no pricing configuration needed.
import asyncio
import valkey.asyncio as valkey_client
from betterdb_agent_cache import AgentCache, TierDefaults
from betterdb_agent_cache.adapters.openai import prepare_params
from betterdb_agent_cache.types import AgentCacheOptions
client = valkey_client.Valkey(host="localhost", port=6379)
cache = AgentCache(AgentCacheOptions(
client=client,
tier_defaults={"llm": TierDefaults(ttl=3600)},
# cost_table is pre-defined for GPT-4o, Claude, Gemini, and 1,900+ others
))
async def main():
params = await prepare_params({
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What is 2+2?"}],
})
result = await cache.llm.check(params)
if result.hit:
print("Cache hit:", result.response)
else:
# ... call OpenAI ...
await cache.llm.store(params, "Four")
asyncio.run(main())Full changelog
See CHANGELOG.md for detailed history.
v0.15.0
Vector / AI tab
A new Vector / AI tab joins the monitor sidebar when BetterDB detects valkey-search or RediSearch on a connection. No configuration required - the tab appears automatically based on capability detection and stays hidden when the module is not loaded.
The tab surfaces two things that weren't previously visible:
FT.SEARCH workload over time. The monitor now tracks search query volume and latency as a continuous time-series, persisted to storage on each poll cycle. The tab shows an ops/sec chart and an average latency chart, defaulting to the last hour and live-updating every 15 seconds. A date range picker lets you query any historical window, pausing live polling while a custom range is active.
Vector index health. Every index on the instance is listed with its doc count, record count, deleted doc count, indexing failure count, and current indexing state. Rows with failures are highlighted. Three health alerts surface as banners when conditions warrant:
- Indexing failures detected on any index
- An index is actively backfilling
- Deleted documents are accumulating
Both Valkey with valkey-search and Redis with RediSearch are supported.
Prometheus metrics
Six new gauges are exposed via /prometheus/metrics on instances where a Search module is detected:
```
betterdb_vector_index_docs{connection, index}
betterdb_vector_index_memory_bytes{connection, index}
betterdb_vector_index_indexing_failures{connection, index}
betterdb_vector_index_percent_indexed{connection, index}
betterdb_commandstats_calls_total{connection, command}
betterdb_commandstats_latency_us{connection, command}
```
Stale labels are removed automatically when an index is dropped or a command disappears from INFO commandstats.
FT.* fixes in the commandlog
FT.SEARCH commands carry vector embeddings as binary PARAMS blocks. These were previously stored and displayed as raw bytes in the commandlog, making search commands unreadable and impossible to group by pattern. Binary and oversized arguments are now replaced with <blob>. FT.* commands are also excluded from the byKeyPrefix aggregation, which was producing meaningless patterns like idx_cache: by splitting index
names on :.
Bug fixes
- Dark mode: white code blocks now render with dark text. Code blocks and
<pre>elements in the Settings page (MCP token display, Claude Code config snippet) and the Webhook Deliveries panel were usingbg-whitewithout a corresponding dark text override, making them unreadable in dark mode. Fixed withdark:text-gray-900. (#119) - API: 404 returned when no connection can be resolved, rather than an empty array. Affects the commandstats history and summary endpoints when no
x-connection-idheader is sent and no default connection is configured.
Internal
- License key included in telemetry ping events.
What's Changed
- Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #115
- Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #116
- feature: slowlog FT.* fix, vector index health metrics, commandstats time-series by @jamby77 in #111
- feature: Vector / AI monitor tab (frontend) by @jamby77 in #112
- added license key to telemetry_ping events. Refactored util method location by @KIvanow in #118
- feature: close commandstats AC gaps (snapshot endpoint, absolute totals, 60s) by @jamby77 in #113
- feature: commandstats API endpoints + Prometheus gauges per spec by @jamby77 in #114
- Agent cache py by @KIvanow in #117
Full Changelog: v0.14.2...v0.15.0
Agent Cache v0.4.0
Added
- Bundled default cost table
-
- A default cost table sourced from LiteLLM's
model_prices_and_context_window.jsonis
now bundled with the package and refreshed on every release
- A default cost table sourced from LiteLLM's
- Cost tracking works out of the box for 100+ models including GPT-4o, Claude, and Gemini — no costTable configuration required
- User-supplied costTable entries are merged on top of the defaults, allowing selective overrides without losing coverage for other models
-
- New useDefaultCostTable option on AgentCacheOptions
- Defaults to true. Set to false to disable the bundled table entirely and supply your own
- DEFAULT_COST_TABLE export
- The bundled table is now exported from the main entry point for inspection or extension
- update:pricing npm script
- Regenerates defaultCostTable.ts from the latest LiteLLM pricing data
Full Changelog: v0.15.0...agent-cache-v0.4.0
Agent Cache Python v0.4.0
betterdb-agent-cache v0.3.0
Python port of @betterdb/agent-cache. Multi-tier exact-match cache for AI agent
workloads backed by Valkey — LLM responses, tool results, and session state, with
built-in OpenTelemetry and Prometheus instrumentation.
Runs on vanilla Valkey 7+. No modules, no RedisJSON, no RediSearch. Works on
ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.
Installation
pip install betterdb-agent-cacheOptional extras install the provider SDKs alongside the library:
pip install "betterdb-agent-cache[openai]"
pip install "betterdb-agent-cache[anthropic]"
pip install "betterdb-agent-cache[langchain]"
pip install "betterdb-agent-cache[langgraph]"
pip install "betterdb-agent-cache[llamaindex]"What's included
Three cache tiers
| Tier | Use for |
|---|---|
cache.llm |
LLM API responses — check / store / store_multipart / invalidate_by_model |
cache.tool |
Tool call results — check / store / set_policy / invalidate_by_tool |
cache.session |
Agent session state — get / set / get_all / destroy_thread / touch |
Provider adapters
| Import | Provider |
|---|---|
betterdb_agent_cache.adapters.openai |
OpenAI Chat Completions |
betterdb_agent_cache.adapters.openai_responses |
OpenAI Responses API |
betterdb_agent_cache.adapters.anthropic |
Anthropic Messages |
betterdb_agent_cache.adapters.llamaindex |
LlamaIndex |
betterdb_agent_cache.adapters.langchain |
LangChain BaseCache |
betterdb_agent_cache.adapters.langgraph |
LangGraph BaseCheckpointSaver |
Pluggable binary normalizer
from betterdb_agent_cache import compose_normalizer, hash_base64
normalizer = compose_normalizer({"base64": hash_base64})Observability
- OpenTelemetry spans on every cache operation
- Prometheus counters, histograms, and gauges (
requests_total,operation_duration_seconds,
cost_saved_total,stored_bytes_total,active_sessions)
Quick start
import asyncio
import valkey.asyncio as valkey_client
from betterdb_agent_cache import AgentCache, ModelCost, TierDefaults
from betterdb_agent_cache.adapters.openai import prepare_params
from betterdb_agent_cache.types import AgentCacheOptions
client = valkey_client.Valkey(host="localhost", port=6379)
cache = AgentCache(AgentCacheOptions(
client=client,
tier_defaults={"llm": TierDefaults(ttl=3600)},
cost_table={"gpt-4o-mini": ModelCost(input_per_1k=0.00015, output_per_1k=0.0006)},
))
async def main():
params = await prepare_params({
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What is 2+2?"}],
})
result = await cache.llm.check(params)
if result.hit:
print("Cache hit:", result.response)
else:
# ... call OpenAI ...
await cache.llm.store(params, "Four")
asyncio.run(main())Full changelog
See CHANGELOG.md for detailed history.
Agent Cache v0.3.0
Multi-modal support and full provider adapter coverage for OpenAI, Anthropic, and LlamaIndex.
What's new
New provider adapters
Four new sub-path imports covering every major provider:
| Import | Provider |
|---|---|
@betterdb/agent-cache/openai |
OpenAI Chat Completions |
@betterdb/agent-cache/openai-responses |
OpenAI Responses API |
@betterdb/agent-cache/anthropic |
Anthropic Messages |
@betterdb/agent-cache/llamaindex |
LlamaIndex |
Each adapter converts provider-native request params into the canonical LlmCacheParams format — no manual normalisation required.
OpenAI Chat handles text, images (URL + base64), audio, files, tool calls, and legacy function role messages.
OpenAI Responses additionally covers reasoning items, function_call / function_call_output item types, and instructions promoted to a system message.
Anthropic covers tool use blocks, tool result blocks, and thinking / extended thinking blocks alongside standard image sources.
LlamaIndex wraps ChatMessage history including text and image nodes.
Pluggable binary normalizer
Binary content (images, audio, documents) is now a first-class part of the cache key pipeline. The new BinaryNormalizer interface lets you control how blobs are reduced to a stable string before hashing.
import { composeNormalizer } from '@betterdb/agent-cache';
const normalizer = composeNormalizer({
// Hash base64 payloads by their decoded bytes
base64: (data) => hashBase64(data),
// Fetch remote images and hash the response body
url: (url) => fetchAndHash(url),
// Use OpenAI file IDs directly as cache keys
fileId: (id, provider) => `${provider}:${id}`,
});Built-in helpers:
| Helper | Behaviour |
|---|---|
hashBase64(data) |
SHA-256 of decoded bytes |
hashBytes(data) |
SHA-256 of raw bytes |
hashUrl(url) |
Normalised URL (sorted query params, lowercased host) |
fetchAndHash(url) |
Fetches URL and SHA-256s the body |
passthrough(ref) |
Scheme-prefixed ref, no transformation |
composeNormalizer(cfg) |
Build a normalizer from per-source / per-kind handlers |
The defaultNormalizer uses passthrough — zero-latency, no network calls, suitable for most use cases.
Extended cache key coverage
LlmCacheParams now includes all parameters that affect model output:
toolChoice,seed,stop,responseFormatreasoningEffort— for extended thinking modelspromptCacheKey— pass-through for provider-level prompt caching
New examples
Runnable examples added for all three new providers:
examples/openai/
examples/anthropic/
examples/llamaindex/
Bug fixes
- Null tool output (
openai-responses):function_call_outputitems withnullorundefinedoutput now produce an empty string instead of the two-character literal"", which was corrupting cache key hashes.
Installation
npm install @betterdb/agent-cache@0.3.0Full changelog
See CHANGELOG.md for a complete list of changes.
What's Changed
- Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #115
- Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #116
Full Changelog: v0.14.2...agent-cache-v0.3.0
v0.14.2
Security updates
fastify 5.8.4 → 5.8.5
Fixes CVE-2026-33806 — a security vulnerability in the Fastify HTTP framework. See GHSA-247c-9743-5963.
@fastify/static 9.0.0 → 9.1.1
Fixes two CVEs in static file serving:
vite 8.0.2 → 8.0.5
Multiple path traversal and filesystem boundary bypass fixes in the dev server:
server.fschecks now apply to env transport requests and query-stripped paths- Sourcemap handlers no longer allow referencing files outside the package root
Bug fixes
Vector Search
- Fixed crashes caused by out-of-bounds data
- Fixed
Find Similarbutton being difficult to click - Fixed graph labels being unreadable and not respecting the active color scheme
What's Changed
- added licenseKey to posthog events by @KIvanow in #104
- Add @betterdb/agent-cache package — multi-tier LLM/tool/session cache with framework adapters by @KIvanow in #105
- cluster support for agent-cache by @KIvanow in #108
- build(deps-dev): bump vite from 8.0.2 to 8.0.5 by @dependabot[bot] in #99
- build(deps): bump @nestjs/core from 11.1.17 to 11.1.18 by @dependabot[bot] in #100
- build(deps): bump fastify from 5.8.4 to 5.8.5 by @dependabot[bot] in #106
- build(deps): bump @fastify/static from 9.0.0 to 9.1.1 by @dependabot[bot] in #110
Full Changelog: v0.14.1...v0.14.2