Skip to content

Releases: BetterDB-inc/monitor

Semantic Cache v0.2.0

24 Apr 19:11
8c269b1

Choose a tag to compare

Release Notes — @betterdb/semantic-cache v0.2.0

v0.1.0 shipped the core cache with text-only string prompts and two adapters.
v0.2.0 adds five new adapters, five embedding helpers, and a set of features
that make the cache production-ready: cost tracking, multi-modal prompts, batch
lookup, threshold tuning, embedding cache, stale-model eviction, and a rerank hook.

Installation

npm install @betterdb/semantic-cache@0.2.0 iovalkey

New adapters

v0.1.0 had LangChain and Vercel AI SDK. v0.2.0 adds:

OpenAI Chat Completions — @betterdb/semantic-cache/openai

Extracts the last user message from ChatCompletionCreateParams. Handles text,
image_url (URL and base64), input_audio, and file content parts.

import { prepareSemanticParams } from '@betterdb/semantic-cache/openai';

const { text, blocks, model } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);
if (!result.hit) {
  const response = await openai.chat.completions.create(params);
  await cache.store(blocks ?? text, response.choices[0].message.content!, { model });
}

OpenAI Responses API — @betterdb/semantic-cache/openai-responses

Extracts the last user input from the Responses API input field — string or
message array with input_text, input_image, and input_file parts.

import { prepareSemanticParams } from '@betterdb/semantic-cache/openai-responses';

const { text, blocks } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);

Anthropic Messages — @betterdb/semantic-cache/anthropic

Extracts the last user message from MessageCreateParamsNonStreaming. Supports
text; base64, URL, and file images; and base64, URL, plaintext, and file documents.

import { prepareSemanticParams } from '@betterdb/semantic-cache/anthropic';

const { text, blocks, model } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);

LlamaIndex — @betterdb/semantic-cache/llamaindex

Extracts the last user ChatMessage from a ChatMessage[] array. Supports
text, image_url, file, audio, and image content parts.

import { prepareSemanticParams } from '@betterdb/semantic-cache/llamaindex';

const { text } = await prepareSemanticParams(messages, { model: 'gpt-4o' });
const result = await cache.check(text);

LangGraph semantic memory store — @betterdb/semantic-cache/langgraph

BetterDBSemanticStore implements the LangGraph BaseStore interface using
vector similarity — find the most semantically relevant past observations for a
given query. This is distinct from @betterdb/agent-cache/langgraph, which does
exact-match checkpoint persistence. Both can coexist on the same Valkey instance
with different key prefixes.

import { BetterDBSemanticStore } from '@betterdb/semantic-cache/langgraph';

const store = new BetterDBSemanticStore({ cache, embedField: 'content' });

await store.put(['user', 'alice', 'facts'], 'pref_001', {
  content: 'Alice prefers async Python over synchronous code.',
});

const results = await store.search(['user', 'alice', 'facts'], {
  query: "What are Alice's coding preferences?",
  limit: 5,
});
// results[i].value — the stored object; results[i].key — the item key

Full interface: put(), get(), search() (semantic KNN or namespace scan),
delete(), batch().

Updated: LangChain — @betterdb/semantic-cache/langchain

BetterDBSemanticCache now wraps responses in a proper AIMessage so chat
models can correctly access response.content. New filterByModel option scopes
hits to a specific LLM configuration (deterministically hashed from llm_string).


Embedding helpers

Five pre-built EmbedFn factories so you don't have to write your own:

Import Provider Default model Dimensions
@betterdb/semantic-cache/embed/openai OpenAI text-embedding-3-small 1536
@betterdb/semantic-cache/embed/voyage Voyage AI voyage-3-lite 512
@betterdb/semantic-cache/embed/cohere Cohere embed-english-v3.0 1024
@betterdb/semantic-cache/embed/ollama Ollama (local) nomic-embed-text 768
@betterdb/semantic-cache/embed/bedrock AWS Bedrock amazon.titan-embed-text-v2:0 1024
import { createOpenAIEmbed } from '@betterdb/semantic-cache/embed/openai';
import { createVoyageEmbed } from '@betterdb/semantic-cache/embed/voyage';
import { createOllamaEmbed } from '@betterdb/semantic-cache/embed/ollama';
import { createBedrockEmbed } from '@betterdb/semantic-cache/embed/bedrock';

const cache = new SemanticCache({
  client,
  embedFn: createVoyageEmbed({ model: 'voyage-3-lite' }),
});

All helpers lazily initialise their clients and cache the instance across calls —
no per-request connection overhead.


New core features

Cost tracking + bundled model price table

Store token counts at cache time; get automatic cost-saved reporting on every hit.
A bundled DEFAULT_COST_TABLE covers 1,971 models sourced from
LiteLLM
and is refreshed on every release via pnpm update:pricing. No configuration is
required for common models.

await cache.store('Summarize this document', responseText, {
  model: 'gpt-4o',
  inputTokens: 512,
  outputTokens: 128,
});

const result = await cache.check('Summarize this document');
console.log(result.costSaved);       // e.g. 0.00385 — dollars saved on this hit

const stats = await cache.stats();
console.log(stats.costSavedMicros);  // cumulative across all hits

Override or extend the table via costTable; disable it with
useDefaultCostTable: false. DEFAULT_COST_TABLE and ModelCost are exported
from the package root.

Multi-modal prompts

check(), store(), and the new storeMultipart() accept string | ContentBlock[].
A ContentBlock[] prompt embeds the text blocks and uses binary refs as an AND-filter —
a hit requires both the text to be semantically similar and all binary refs to
match exactly.

import { hashBase64, type ContentBlock } from '@betterdb/semantic-cache';

const prompt: ContentBlock[] = [
  { type: 'text', text: 'What is in this image?' },
  { type: 'binary', kind: 'image', mediaType: 'image/png', ref: hashBase64(b64) },
];

await cache.store(prompt, 'A red square on a white background.');
const result = await cache.check(prompt); // hit requires both text AND same image

storeMultipart(prompt, blocks[]) stores a structured response (text +
citations + tool calls) and returns result.contentBlocks on hit.

Binary normalizer: composeNormalizer, hashBase64, hashBytes, hashUrl,
and fetchAndHash produce stable, compact refs for any binary source. The
defaultNormalizer hashes base64 and bytes rather than storing raw data in TAG
fields. All normalizer utilities are exported from the package root.

Embedding cache

Computed embedding vectors are stored in Valkey ({name}:embed:{sha256}) and
reused on subsequent check() calls for the same text — embedFn is only
called once per unique string, then reads hit a fast GET.

// enabled by default; override if needed:
new SemanticCache({
  embeddingCache: { enabled: true, ttl: 86400 },
});

New Prometheus counter: {prefix}_embedding_cache_total labelled
result: hit | miss.

Batch check — checkBatch()

Embeds all prompts in parallel and pipelines all FT.SEARCH calls in a single
Valkey round-trip.

const results = await cache.checkBatch([
  'What is the capital of France?',
  'Who wrote Hamlet?',
  'What is the speed of light?',
]);
// results[i] is a CacheCheckResult — same shape as check()

Typically 60–80% faster than sequential check() calls for bulk lookups,
dashboards, and prefetch patterns.

Rerank hook

Retrieve the top-k most similar candidates and apply custom ranking before
deciding whether to serve from cache — useful for cross-encoder reranking,
LLM-as-judge, or domain-specific scoring.

const result = await cache.check(query, {
  rerank: {
    k: 5,
    rerankFn: async (query, candidates) => {
      const scores = await crossEncoder.predict(query, candidates.map(c => c.response));
      const best = scores.indexOf(Math.max(...scores));
      return scores[best] > 0.8 ? best : -1; // -1 → reject all → miss
    },
  },
});

Stale-model eviction

Automatically evict cached entries when you upgrade the LLM for a prompt category.
On a hit, if the stored model differs from currentModel, the entry is deleted and
the call returns a miss — forcing a fresh response under the new model.

const result = await cache.check(prompt, {
  staleAfterModelChange: true,
  currentModel: 'gpt-4o',  // evict if entry was stored with gpt-3.5-turbo
});

New Prometheus counter: {prefix}_stale_model_evictions_total.

Threshold effectiveness recommendations

The cache records a rolling window of cosine distance scores (up to 10,000
entries, 7-day retention). thresholdEffectiveness() analyzes this window and
returns a concrete recommendation:

const analysis = await cache.thresholdEffectiveness({ minSamples: 100 });
// {
//   recommendation: 'tighten_threshold',
//   currentThreshold: 0.1,
//   recommendedThreshold: 0.072,
//   hitRate: 0.83,
//   uncertainHitRate: 0.31,
//   nearMissRate: 0.04,
//   reasoning: '31.0% of hits are in the uncertainty band — tighten the threshold...',
// }

// Per-category + aggregate in one call:
const allResults = await cache.th...
Read more

Semantic Cache Python v0.1.2

24 Apr 19:29
4411483

Choose a tag to compare

betterdb-semantic-cache v0.1.0

Python port of @betterdb/semantic-cache. Embeddings-based semantic cache for AI
workloads backed by Valkey vector search — similarity matching, cost tracking,
multi-modal prompts, embedding cache, and threshold tuning, with built-in
OpenTelemetry and Prometheus instrumentation.

Requires Valkey 8+ with the valkey-search module (vector index support).
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.


Installation

pip install betterdb-semantic-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]"   # voyage / cohere / ollama embed helpers
pip install "betterdb-semantic-cache[bedrock]"  # AWS Bedrock embed helper

What's included

SemanticCache

Method Description
initialize() Create or attach to the vector index
check(prompt) Similarity lookup — returns hit/miss with confidence and optional cost saved
store(prompt, response) Store a response with optional cost metadata
store_multipart(prompt, blocks) Store structured content blocks
check_batch(prompts) Pipelined batch lookup
invalidate(filter) Delete entries matching a FT.SEARCH filter
invalidate_by_model(model) Delete all entries for a model
invalidate_by_category(category) Delete all entries in a category
stats() Hit/miss counts and cumulative cost saved
index_info() Index name, doc count, vector dimension
threshold_effectiveness() Rolling window analysis and threshold recommendations
threshold_effectiveness_all() Per-category analysis
flush() Drop index and delete all cached entries

Provider adapters

Import Provider
betterdb_semantic_cache.adapters.openai OpenAI Chat Completions
betterdb_semantic_cache.adapters.openai_responses OpenAI Responses API
betterdb_semantic_cache.adapters.anthropic Anthropic Messages
betterdb_semantic_cache.adapters.llamaindex LlamaIndex ChatMessage[]
betterdb_semantic_cache.adapters.langchain LangChain BaseCache (async-only)
betterdb_semantic_cache.adapters.langgraph LangGraph BetterDBSemanticStore

Embedding helpers

Import Provider
embed.openai OpenAI Embeddings API
embed.voyage Voyage AI (httpx, no SDK required)
embed.cohere Cohere Embed v3 (httpx, no SDK required)
embed.ollama Ollama local models (httpx, no SDK required)
embed.bedrock AWS Bedrock Titan / Cohere (boto3)

Bundled default cost table

A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled and refreshed on every release. Cost savings tracking works out of the
box for 1,900+ models — no cost_table configuration required.

Observability

  • OpenTelemetry spans on every cache operation
  • Prometheus metrics: requests_total, similarity_score, operation_duration_seconds,
    embedding_duration_seconds, cost_saved_total, embedding_cache_total,
    stale_model_evictions_total

Cluster support

Pass a ValkeyCluster client and all SCAN-based operations (flush,
invalidate_by_model, invalidate_by_category) automatically iterate all master nodes.


Quick start

import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.types import CacheStoreOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed

client = valkey.Valkey(host="localhost", port=6379)
cache = SemanticCache(SemanticCacheOptions(
    client=client,
    embed_fn=create_openai_embed(),
    default_threshold=0.12,
))

async def main():
    await cache.initialize()

    result = await cache.check("What is the capital of France?")
    if result.hit:
        print("Cache hit:", result.response)
    else:
        answer = "Paris"  # ... call your LLM ...
        await cache.store(
            "What is the capital of France?", answer,
            CacheStoreOptions(model="gpt-4o", input_tokens=20, output_tokens=5),
        )

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

Semantic cache python 0.1.1

24 Apr 19:22
8c269b1

Choose a tag to compare

Updated release flow

betterdb-semantic-cache v0.1.0

24 Apr 19:16
8c269b1

Choose a tag to compare

betterdb-semantic-cache v0.1.0

Initial release. Full Python port of @betterdb/semantic-cache v0.2.0 — async-first,
dataclass config, feature-for-feature parity with the TypeScript implementation.

Requires Python 3.11+, Valkey 8+ with the valkey-search module.
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.

Installation

pip install betterdb-semantic-cache

Install optional extras alongside the library:

pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]"     # Voyage AI, Cohere, Ollama
pip install "betterdb-semantic-cache[bedrock]"   # AWS Bedrock
pip install "betterdb-semantic-cache[all]"       # everything above

Adapters

Six adapters extract the semantic cache key from provider-specific request objects.
All return a SemanticParams dataclass with text, blocks, and model fields.

OpenAI Chat Completions

from betterdb_semantic_cache.adapters.openai import prepare_semantic_params

params = {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
}
sp = await prepare_semantic_params(params)

result = await cache.check(sp.blocks or sp.text)
if not result.hit:
    response = await openai_client.chat.completions.create(**params)
    await cache.store(sp.blocks or sp.text, response.choices[0].message.content,
                      CacheStoreOptions(model=sp.model))

Handles text, image_url (URL and base64), input_audio, and file content parts.
Pass normalizer=cache.normalizer to share the same normalization strategy.

OpenAI Responses API

from betterdb_semantic_cache.adapters.openai_responses import prepare_semantic_params

sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)

Supports input_text, input_image, and input_file content parts.

Anthropic Messages

from betterdb_semantic_cache.adapters.anthropic import prepare_semantic_params

sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)

Supports text; base64, URL, and file images; base64, URL, plaintext, and file documents.

LlamaIndex

from betterdb_semantic_cache.adapters.llamaindex import prepare_semantic_params

sp = await prepare_semantic_params(messages, model="gpt-4o")
result = await cache.check(sp.text)

Extracts the last user ChatMessage from a list. Supports text, image_url,
file, audio, and image content parts.

LangChain — async BaseCache

BetterDBSemanticCache implements LangChain's BaseCache interface. Because
SemanticCache is async-only, the synchronous lookup() / update() methods
return None / no-op; use ainvoke / astream to get real cache behaviour.

from betterdb_semantic_cache.adapters.langchain import BetterDBSemanticCache
from langchain_openai import ChatOpenAI

lc_cache = BetterDBSemanticCache(cache)
llm = ChatOpenAI(model="gpt-4o", cache=lc_cache)

# Cache is transparent — hits are returned without calling the LLM
response = await llm.ainvoke("What is the capital of France?")

Optional filter_by_model=True scopes hits to a specific LLM configuration.

LangGraph semantic memory store

BetterDBSemanticStore implements the LangGraph BaseStore interface using
vector similarity for retrieval. Use this for agent memory (finding the most
relevant past facts for a query), not for checkpoint persistence — use
betterdb_agent_cache.adapters.langgraph for that. Both can coexist on the
same Valkey instance with different key prefixes.

from betterdb_semantic_cache.adapters.langgraph import BetterDBSemanticStore

store = BetterDBSemanticStore(cache, embed_field="content")

await store.aput(["user", "alice", "facts"], "pref_001", {
    "content": "Alice prefers async Python over synchronous code.",
})

results = await store.asearch(["user", "alice", "facts"],
                               query="What are Alice's coding preferences?",
                               limit=5)
# results[i].value — the stored dict; results[i].key — the item key

Full interface: aput(), aget(), asearch() (semantic KNN or namespace scan),
adelete(), abatch().


Embedding helpers

Five pre-built EmbedFn callables so you don't have to write your own:

Import Provider Default model Dimensions
betterdb_semantic_cache.embed.openai OpenAI text-embedding-3-small 1536
betterdb_semantic_cache.embed.voyage Voyage AI voyage-3-lite 512
betterdb_semantic_cache.embed.cohere Cohere embed-english-v3.0 1024
betterdb_semantic_cache.embed.ollama Ollama (local) nomic-embed-text 768
betterdb_semantic_cache.embed.bedrock AWS Bedrock amazon.titan-embed-text-v2:0 1024
from betterdb_semantic_cache.embed.openai import create_openai_embed
from betterdb_semantic_cache.embed.voyage import create_voyage_embed
from betterdb_semantic_cache.embed.ollama import create_ollama_embed

cache = SemanticCache(SemanticCacheOptions(
    client=client,
    embed_fn=create_voyage_embed(model="voyage-3-lite"),
))

The Voyage AI, Cohere, and Ollama helpers use httpx directly — no provider SDK
required. The httpx client is created once per helper instance and reused across
calls. Install: pip install "betterdb-semantic-cache[httpx]".


Core features

Cost tracking + bundled model price table

Store token counts at cache time; get automatic cost-saved reporting on every hit.
A bundled DEFAULT_COST_TABLE covers 1,900+ models from
LiteLLM
and is refreshed on every release. No configuration required for common models.

await cache.store("Summarize this document", response_text,
                  CacheStoreOptions(model="gpt-4o", input_tokens=512, output_tokens=128))

result = await cache.check("Summarize this document")
print(result.cost_saved)        # e.g. 0.00385 — dollars saved on this hit

stats = await cache.stats()
print(stats.cost_saved_micros)  # cumulative across all hits

Override entries with cost_table={...}; disable with use_default_cost_table=False.

Multi-modal prompts

check(), store(), and store_multipart() accept str | list[ContentBlock].
A ContentBlock list embeds the text blocks and uses binary refs as an AND-filter —
a hit requires both semantic similarity on the text and all binary refs to match.

from betterdb_semantic_cache.normalizer import hash_base64
from betterdb_semantic_cache.utils import TextBlock, BinaryBlock

prompt = [
    TextBlock(type="text", text="What is in this image?"),
    BinaryBlock(type="binary", kind="image", mediaType="image/png",
                ref=hash_base64(b64_data)),
]

await cache.store_multipart(prompt, [TextBlock(type="text", text="A red square.")])
result = await cache.check(prompt)  # hit requires both text match AND same image
if result.hit:
    print(result.content_blocks)    # the stored ContentBlock[] response

Binary normalizer: compose_normalizer, hash_base64, hash_bytes, hash_url,
and fetch_and_hash generate stable, compact refs for any binary source. The
default_normalizer hashes base64 and bytes rather than storing raw data in TAG
fields. Access the configured normalizer via cache.normalizer.

Embedding cache

Computed embedding vectors are stored in Valkey ({name}:embed:{sha256}) and
reused on subsequent check() calls for the same text — embed_fn is only
called once per unique string.

SemanticCache(SemanticCacheOptions(
    ...,
    embedding_cache=EmbeddingCacheOptions(enabled=True, ttl=86400),  # default
))

Prometheus counter: {prefix}_embedding_cache_total labelled result: hit | miss.

Batch check — check_batch()

Embeds all prompts in parallel and pipelines all FT.SEARCH calls in a single
Valkey round-trip.

results = await cache.check_batch([
    "What is the capital of France?",
    "Who wrote Hamlet?",
    "What is the speed of light?",
])
# results[i] is a CacheCheckResult — same shape as check()

Rerank hook

Retrieve the top-k most similar candidates and apply custom ranking before
serving from cache.

async def pick_longest(_query: str, candidates: list[dict]) -> int:
    return max(range(len(candidates)), key=lambda i: len(candidates[i]["response"]))

result = await cache.check(query, CacheCheckOptions(
    rerank=RerankOptions(k=5, rerank_fn=pick_longest),
))

Return -1 from rerank_fn to reject all candidates (miss).

Stale-model eviction

Automatically evict cached entries when you upgrade the LLM for a prompt category.
On a hit, if the stored model differs from current_model, the entry is deleted and
the call returns a miss.

result = await cache.check(prompt, CacheCheckOptions(
    stale_after_model_change=True,
    current_model="gpt-4o",   # evict if entry was stored with gpt-3.5-turbo
))

Prometheus counter: {prefix}_stale_model_evictions_total.

Threshold effectiveness recommendations

threshold_effectiveness() analyzes a rolling window of cosine distance scores
(up to 10,000 entries, 7-day retention) and returns a concrete recommendation:

analysis = await cache.threshold_effectiveness(min_samples=100)
# ThresholdEffectivenessResult:
#   recommendation:        ...
Read more

Agent Cache Python v0.4.1

23 Apr 11:15
a8266ad

Choose a tag to compare

betterdb-agent-cache v0.4.0

Python port of @betterdb/agent-cache. Multi-tier exact-match cache for AI agent
workloads backed by Valkey — LLM responses, tool results, and session state, with
built-in OpenTelemetry and Prometheus instrumentation.

Runs on vanilla Valkey 7+. No modules, no RedisJSON, no RediSearch. Works on
ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.


Installation

pip install betterdb-agent-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-agent-cache[openai]"
pip install "betterdb-agent-cache[anthropic]"
pip install "betterdb-agent-cache[langchain]"
pip install "betterdb-agent-cache[langgraph]"
pip install "betterdb-agent-cache[llamaindex]"

What's included

Three cache tiers

Tier Use for
cache.llm LLM API responses — check / store / store_multipart / invalidate_by_model
cache.tool Tool call results — check / store / set_policy / invalidate_by_tool
cache.session Agent session state — get / set / get_all / destroy_thread / touch

Provider adapters

Import Provider
betterdb_agent_cache.adapters.openai OpenAI Chat Completions
betterdb_agent_cache.adapters.openai_responses OpenAI Responses API
betterdb_agent_cache.adapters.anthropic Anthropic Messages
betterdb_agent_cache.adapters.llamaindex LlamaIndex
betterdb_agent_cache.adapters.langchain LangChain BaseCache
betterdb_agent_cache.adapters.langgraph LangGraph BaseCheckpointSaver

Bundled default cost table

A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled with the package and refreshed on every release. Cost tracking works out of
the box for 1,900+ models — no cost_table configuration required.

User-supplied cost_table entries are merged on top of the defaults, so you can
override a single model without losing coverage for everything else:

cache = AgentCache(AgentCacheOptions(
    client=client,
    cost_table={"gpt-4o": ModelCost(input_per_1k=0.002, output_per_1k=0.008)},
))

To disable the default table entirely:

cache = AgentCache(AgentCacheOptions(
    client=client,
    use_default_cost_table=False,
    cost_table={...},
))

The bundled table is also exported directly if you need to inspect it:

from betterdb_agent_cache import DEFAULT_COST_TABLE

Pluggable binary normalizer

Controls how binary content (images, audio, documents) is reduced to a stable string
before hashing. Zero-latency by default — no network calls.

from betterdb_agent_cache import compose_normalizer, hash_base64

normalizer = compose_normalizer({"base64": hash_base64})

Observability

  • OpenTelemetry spans on every cache operation
  • Prometheus counters, histograms, and gauges: requests_total, operation_duration_seconds,
    cost_saved_total, stored_bytes_total, active_sessions

Cluster support

Pass a ValkeyCluster client and all SCAN-based operations (flush, invalidate_by_model,
invalidate_by_tool, destroy_thread, touch) automatically iterate all master nodes.


Quick start

Cost tracking is pre-defined for 1,900+ models — no pricing configuration needed.

import asyncio
import valkey.asyncio as valkey_client
from betterdb_agent_cache import AgentCache, TierDefaults
from betterdb_agent_cache.adapters.openai import prepare_params
from betterdb_agent_cache.types import AgentCacheOptions

client = valkey_client.Valkey(host="localhost", port=6379)
cache = AgentCache(AgentCacheOptions(
    client=client,
    tier_defaults={"llm": TierDefaults(ttl=3600)},
    # cost_table is pre-defined for GPT-4o, Claude, Gemini, and 1,900+ others
))

async def main():
    params = await prepare_params({
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "What is 2+2?"}],
    })

    result = await cache.llm.check(params)
    if result.hit:
        print("Cache hit:", result.response)
    else:
        # ... call OpenAI ...
        await cache.llm.store(params, "Four")

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

v0.15.0

22 Apr 12:38
26679f2

Choose a tag to compare

Vector / AI tab

A new Vector / AI tab joins the monitor sidebar when BetterDB detects valkey-search or RediSearch on a connection. No configuration required - the tab appears automatically based on capability detection and stays hidden when the module is not loaded.

The tab surfaces two things that weren't previously visible:

FT.SEARCH workload over time. The monitor now tracks search query volume and latency as a continuous time-series, persisted to storage on each poll cycle. The tab shows an ops/sec chart and an average latency chart, defaulting to the last hour and live-updating every 15 seconds. A date range picker lets you query any historical window, pausing live polling while a custom range is active.

Vector index health. Every index on the instance is listed with its doc count, record count, deleted doc count, indexing failure count, and current indexing state. Rows with failures are highlighted. Three health alerts surface as banners when conditions warrant:

  • Indexing failures detected on any index
  • An index is actively backfilling
  • Deleted documents are accumulating

Both Valkey with valkey-search and Redis with RediSearch are supported.

Prometheus metrics

Six new gauges are exposed via /prometheus/metrics on instances where a Search module is detected:

```
betterdb_vector_index_docs{connection, index}
betterdb_vector_index_memory_bytes{connection, index}
betterdb_vector_index_indexing_failures{connection, index}
betterdb_vector_index_percent_indexed{connection, index}

betterdb_commandstats_calls_total{connection, command}
betterdb_commandstats_latency_us{connection, command}
```

Stale labels are removed automatically when an index is dropped or a command disappears from INFO commandstats.

FT.* fixes in the commandlog

FT.SEARCH commands carry vector embeddings as binary PARAMS blocks. These were previously stored and displayed as raw bytes in the commandlog, making search commands unreadable and impossible to group by pattern. Binary and oversized arguments are now replaced with <blob>. FT.* commands are also excluded from the byKeyPrefix aggregation, which was producing meaningless patterns like idx_cache: by splitting index
names on :.

Bug fixes

  • Dark mode: white code blocks now render with dark text. Code blocks and <pre> elements in the Settings page (MCP token display, Claude Code config snippet) and the Webhook Deliveries panel were using bg-white without a corresponding dark text override, making them unreadable in dark mode. Fixed with dark:text-gray-900. (#119)
  • API: 404 returned when no connection can be resolved, rather than an empty array. Affects the commandstats history and summary endpoints when no x-connection-id header is sent and no default connection is configured.

Internal

  • License key included in telemetry ping events.

What's Changed

  • Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #115
  • Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #116
  • feature: slowlog FT.* fix, vector index health metrics, commandstats time-series by @jamby77 in #111
  • feature: Vector / AI monitor tab (frontend) by @jamby77 in #112
  • added license key to telemetry_ping events. Refactored util method location by @KIvanow in #118
  • feature: close commandstats AC gaps (snapshot endpoint, absolute totals, 60s) by @jamby77 in #113
  • feature: commandstats API endpoints + Prometheus gauges per spec by @jamby77 in #114
  • Agent cache py by @KIvanow in #117

Full Changelog: v0.14.2...v0.15.0

Agent Cache v0.4.0

23 Apr 09:30
1c32e53

Choose a tag to compare

Added

  • Bundled default cost table
    • Cost tracking works out of the box for 100+ models including GPT-4o, Claude, and Gemini — no costTable configuration required
    • User-supplied costTable entries are merged on top of the defaults, allowing selective overrides without losing coverage for other models
  • New useDefaultCostTable option on AgentCacheOptions
    • Defaults to true. Set to false to disable the bundled table entirely and supply your own
  • DEFAULT_COST_TABLE export
    • The bundled table is now exported from the main entry point for inspection or extension
  • update:pricing npm script
    • Regenerates defaultCostTable.ts from the latest LiteLLM pricing data

Full Changelog: v0.15.0...agent-cache-v0.4.0

Agent Cache Python v0.4.0

23 Apr 10:33
1c32e53

Choose a tag to compare

betterdb-agent-cache v0.3.0

Python port of @betterdb/agent-cache. Multi-tier exact-match cache for AI agent
workloads backed by Valkey — LLM responses, tool results, and session state, with
built-in OpenTelemetry and Prometheus instrumentation.

Runs on vanilla Valkey 7+. No modules, no RedisJSON, no RediSearch. Works on
ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.


Installation

pip install betterdb-agent-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-agent-cache[openai]"
pip install "betterdb-agent-cache[anthropic]"
pip install "betterdb-agent-cache[langchain]"
pip install "betterdb-agent-cache[langgraph]"
pip install "betterdb-agent-cache[llamaindex]"

What's included

Three cache tiers

Tier Use for
cache.llm LLM API responses — check / store / store_multipart / invalidate_by_model
cache.tool Tool call results — check / store / set_policy / invalidate_by_tool
cache.session Agent session state — get / set / get_all / destroy_thread / touch

Provider adapters

Import Provider
betterdb_agent_cache.adapters.openai OpenAI Chat Completions
betterdb_agent_cache.adapters.openai_responses OpenAI Responses API
betterdb_agent_cache.adapters.anthropic Anthropic Messages
betterdb_agent_cache.adapters.llamaindex LlamaIndex
betterdb_agent_cache.adapters.langchain LangChain BaseCache
betterdb_agent_cache.adapters.langgraph LangGraph BaseCheckpointSaver

Pluggable binary normalizer

from betterdb_agent_cache import compose_normalizer, hash_base64

normalizer = compose_normalizer({"base64": hash_base64})

Observability

  • OpenTelemetry spans on every cache operation
  • Prometheus counters, histograms, and gauges (requests_total, operation_duration_seconds,
    cost_saved_total, stored_bytes_total, active_sessions)

Quick start

import asyncio
import valkey.asyncio as valkey_client
from betterdb_agent_cache import AgentCache, ModelCost, TierDefaults
from betterdb_agent_cache.adapters.openai import prepare_params
from betterdb_agent_cache.types import AgentCacheOptions

client = valkey_client.Valkey(host="localhost", port=6379)
cache = AgentCache(AgentCacheOptions(
    client=client,
    tier_defaults={"llm": TierDefaults(ttl=3600)},
    cost_table={"gpt-4o-mini": ModelCost(input_per_1k=0.00015, output_per_1k=0.0006)},
))

async def main():
    params = await prepare_params({
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "What is 2+2?"}],
    })

    result = await cache.llm.check(params)
    if result.hit:
        print("Cache hit:", result.response)
    else:
        # ... call OpenAI ...
        await cache.llm.store(params, "Four")

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

Agent Cache v0.3.0

20 Apr 19:08
e74fbf0

Choose a tag to compare

Multi-modal support and full provider adapter coverage for OpenAI, Anthropic, and LlamaIndex.


What's new

New provider adapters

Four new sub-path imports covering every major provider:

Import Provider
@betterdb/agent-cache/openai OpenAI Chat Completions
@betterdb/agent-cache/openai-responses OpenAI Responses API
@betterdb/agent-cache/anthropic Anthropic Messages
@betterdb/agent-cache/llamaindex LlamaIndex

Each adapter converts provider-native request params into the canonical LlmCacheParams format — no manual normalisation required.

OpenAI Chat handles text, images (URL + base64), audio, files, tool calls, and legacy function role messages.

OpenAI Responses additionally covers reasoning items, function_call / function_call_output item types, and instructions promoted to a system message.

Anthropic covers tool use blocks, tool result blocks, and thinking / extended thinking blocks alongside standard image sources.

LlamaIndex wraps ChatMessage history including text and image nodes.


Pluggable binary normalizer

Binary content (images, audio, documents) is now a first-class part of the cache key pipeline. The new BinaryNormalizer interface lets you control how blobs are reduced to a stable string before hashing.

import { composeNormalizer } from '@betterdb/agent-cache';

const normalizer = composeNormalizer({
  // Hash base64 payloads by their decoded bytes
  base64: (data) => hashBase64(data),
  // Fetch remote images and hash the response body
  url: (url) => fetchAndHash(url),
  // Use OpenAI file IDs directly as cache keys
  fileId: (id, provider) => `${provider}:${id}`,
});

Built-in helpers:

Helper Behaviour
hashBase64(data) SHA-256 of decoded bytes
hashBytes(data) SHA-256 of raw bytes
hashUrl(url) Normalised URL (sorted query params, lowercased host)
fetchAndHash(url) Fetches URL and SHA-256s the body
passthrough(ref) Scheme-prefixed ref, no transformation
composeNormalizer(cfg) Build a normalizer from per-source / per-kind handlers

The defaultNormalizer uses passthrough — zero-latency, no network calls, suitable for most use cases.


Extended cache key coverage

LlmCacheParams now includes all parameters that affect model output:

  • toolChoice, seed, stop, responseFormat
  • reasoningEffort — for extended thinking models
  • promptCacheKey — pass-through for provider-level prompt caching

New examples

Runnable examples added for all three new providers:

examples/openai/
examples/anthropic/
examples/llamaindex/

Bug fixes

  • Null tool output (openai-responses): function_call_output items with null or undefined output now produce an empty string instead of the two-character literal "", which was corrupting cache key hashes.

Installation

npm install @betterdb/agent-cache@0.3.0

Full changelog

See CHANGELOG.md for a complete list of changes.

What's Changed

  • Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #115
  • Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #116

Full Changelog: v0.14.2...agent-cache-v0.3.0

v0.14.2

17 Apr 06:43
200435c

Choose a tag to compare

Security updates

fastify 5.8.4 → 5.8.5

Fixes CVE-2026-33806 — a security vulnerability in the Fastify HTTP framework. See GHSA-247c-9743-5963.

@fastify/static 9.0.0 → 9.1.1

Fixes two CVEs in static file serving:

vite 8.0.2 → 8.0.5

Multiple path traversal and filesystem boundary bypass fixes in the dev server:

  • server.fs checks now apply to env transport requests and query-stripped paths
  • Sourcemap handlers no longer allow referencing files outside the package root

Bug fixes

Vector Search

  • Fixed crashes caused by out-of-bounds data
  • Fixed Find Similar button being difficult to click
  • Fixed graph labels being unreadable and not respecting the active color scheme

What's Changed

  • added licenseKey to posthog events by @KIvanow in #104
  • Add @betterdb/agent-cache package — multi-tier LLM/tool/session cache with framework adapters by @KIvanow in #105
  • cluster support for agent-cache by @KIvanow in #108
  • build(deps-dev): bump vite from 8.0.2 to 8.0.5 by @dependabot[bot] in #99
  • build(deps): bump @nestjs/core from 11.1.17 to 11.1.18 by @dependabot[bot] in #100
  • build(deps): bump fastify from 5.8.4 to 5.8.5 by @dependabot[bot] in #106
  • build(deps): bump @fastify/static from 9.0.0 to 9.1.1 by @dependabot[bot] in #110

Full Changelog: v0.14.1...v0.14.2