24 Apr 19:11

KIvanow

8c269b1

Semantic Cache v0.2.0

Release Notes — @betterdb/semantic-cache v0.2.0

v0.1.0 shipped the core cache with text-only string prompts and two adapters.
v0.2.0 adds five new adapters, five embedding helpers, and a set of features
that make the cache production-ready: cost tracking, multi-modal prompts, batch
lookup, threshold tuning, embedding cache, stale-model eviction, and a rerank hook.

Installation

npm install @betterdb/semantic-cache@0.2.0 iovalkey

New adapters

v0.1.0 had LangChain and Vercel AI SDK. v0.2.0 adds:

OpenAI Chat Completions — `@betterdb/semantic-cache/openai`

Extracts the last user message from ChatCompletionCreateParams. Handles text,
image_url (URL and base64), input_audio, and file content parts.

import { prepareSemanticParams } from '@betterdb/semantic-cache/openai';

const { text, blocks, model } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);
if (!result.hit) {
  const response = await openai.chat.completions.create(params);
  await cache.store(blocks ?? text, response.choices[0].message.content!, { model });
}

OpenAI Responses API — `@betterdb/semantic-cache/openai-responses`

Extracts the last user input from the Responses API input field — string or
message array with input_text, input_image, and input_file parts.

import { prepareSemanticParams } from '@betterdb/semantic-cache/openai-responses';

const { text, blocks } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);

Anthropic Messages — `@betterdb/semantic-cache/anthropic`

Extracts the last user message from MessageCreateParamsNonStreaming. Supports
text; base64, URL, and file images; and base64, URL, plaintext, and file documents.

import { prepareSemanticParams } from '@betterdb/semantic-cache/anthropic';

const { text, blocks, model } = await prepareSemanticParams(params);
const result = await cache.check(blocks ?? text);

LlamaIndex — `@betterdb/semantic-cache/llamaindex`

Extracts the last user ChatMessage from a ChatMessage[] array. Supports
text, image_url, file, audio, and image content parts.

import { prepareSemanticParams } from '@betterdb/semantic-cache/llamaindex';

const { text } = await prepareSemanticParams(messages, { model: 'gpt-4o' });
const result = await cache.check(text);

LangGraph semantic memory store — `@betterdb/semantic-cache/langgraph`

BetterDBSemanticStore implements the LangGraph BaseStore interface using
vector similarity — find the most semantically relevant past observations for a
given query. This is distinct from @betterdb/agent-cache/langgraph, which does
exact-match checkpoint persistence. Both can coexist on the same Valkey instance
with different key prefixes.

import { BetterDBSemanticStore } from '@betterdb/semantic-cache/langgraph';

const store = new BetterDBSemanticStore({ cache, embedField: 'content' });

await store.put(['user', 'alice', 'facts'], 'pref_001', {
  content: 'Alice prefers async Python over synchronous code.',
});

const results = await store.search(['user', 'alice', 'facts'], {
  query: "What are Alice's coding preferences?",
  limit: 5,
});
// results[i].value — the stored object; results[i].key — the item key

Full interface: put(), get(), search() (semantic KNN or namespace scan),
delete(), batch().

Updated: LangChain — `@betterdb/semantic-cache/langchain`

BetterDBSemanticCache now wraps responses in a proper AIMessage so chat
models can correctly access response.content. New filterByModel option scopes
hits to a specific LLM configuration (deterministically hashed from llm_string).

Embedding helpers

Five pre-built EmbedFn factories so you don't have to write your own:

Import	Provider	Default model	Dimensions
`@betterdb/semantic-cache/embed/openai`	OpenAI	`text-embedding-3-small`	1536
`@betterdb/semantic-cache/embed/voyage`	Voyage AI	`voyage-3-lite`	512
`@betterdb/semantic-cache/embed/cohere`	Cohere	`embed-english-v3.0`	1024
`@betterdb/semantic-cache/embed/ollama`	Ollama (local)	`nomic-embed-text`	768
`@betterdb/semantic-cache/embed/bedrock`	AWS Bedrock	`amazon.titan-embed-text-v2:0`	1024

import { createOpenAIEmbed } from '@betterdb/semantic-cache/embed/openai';
import { createVoyageEmbed } from '@betterdb/semantic-cache/embed/voyage';
import { createOllamaEmbed } from '@betterdb/semantic-cache/embed/ollama';
import { createBedrockEmbed } from '@betterdb/semantic-cache/embed/bedrock';

const cache = new SemanticCache({
  client,
  embedFn: createVoyageEmbed({ model: 'voyage-3-lite' }),
});

All helpers lazily initialise their clients and cache the instance across calls —
no per-request connection overhead.

New core features

Cost tracking + bundled model price table

Store token counts at cache time; get automatic cost-saved reporting on every hit.
A bundled DEFAULT_COST_TABLE covers 1,971 models sourced from
LiteLLM
and is refreshed on every release via pnpm update:pricing. No configuration is
required for common models.

await cache.store('Summarize this document', responseText, {
  model: 'gpt-4o',
  inputTokens: 512,
  outputTokens: 128,
});

const result = await cache.check('Summarize this document');
console.log(result.costSaved);       // e.g. 0.00385 — dollars saved on this hit

const stats = await cache.stats();
console.log(stats.costSavedMicros);  // cumulative across all hits

Override or extend the table via costTable; disable it with
useDefaultCostTable: false. DEFAULT_COST_TABLE and ModelCost are exported
from the package root.

Multi-modal prompts

check(), store(), and the new storeMultipart() accept string | ContentBlock[].
A ContentBlock[] prompt embeds the text blocks and uses binary refs as an AND-filter —
a hit requires both the text to be semantically similar and all binary refs to
match exactly.

import { hashBase64, type ContentBlock } from '@betterdb/semantic-cache';

const prompt: ContentBlock[] = [
  { type: 'text', text: 'What is in this image?' },
  { type: 'binary', kind: 'image', mediaType: 'image/png', ref: hashBase64(b64) },
];

await cache.store(prompt, 'A red square on a white background.');
const result = await cache.check(prompt); // hit requires both text AND same image

storeMultipart(prompt, blocks[]) stores a structured response (text +
citations + tool calls) and returns result.contentBlocks on hit.

Binary normalizer: composeNormalizer, hashBase64, hashBytes, hashUrl,
and fetchAndHash produce stable, compact refs for any binary source. The
defaultNormalizer hashes base64 and bytes rather than storing raw data in TAG
fields. All normalizer utilities are exported from the package root.

Embedding cache

Computed embedding vectors are stored in Valkey ({name}:embed:{sha256}) and
reused on subsequent check() calls for the same text — embedFn is only
called once per unique string, then reads hit a fast GET.

// enabled by default; override if needed:
new SemanticCache({
  embeddingCache: { enabled: true, ttl: 86400 },
});

New Prometheus counter: {prefix}_embedding_cache_total labelled
result: hit | miss.

Batch check — `checkBatch()`

Embeds all prompts in parallel and pipelines all FT.SEARCH calls in a single
Valkey round-trip.

const results = await cache.checkBatch([
  'What is the capital of France?',
  'Who wrote Hamlet?',
  'What is the speed of light?',
]);
// results[i] is a CacheCheckResult — same shape as check()

Typically 60–80% faster than sequential check() calls for bulk lookups,
dashboards, and prefetch patterns.

Rerank hook

Retrieve the top-k most similar candidates and apply custom ranking before
deciding whether to serve from cache — useful for cross-encoder reranking,
LLM-as-judge, or domain-specific scoring.

const result = await cache.check(query, {
  rerank: {
    k: 5,
    rerankFn: async (query, candidates) => {
      const scores = await crossEncoder.predict(query, candidates.map(c => c.response));
      const best = scores.indexOf(Math.max(...scores));
      return scores[best] > 0.8 ? best : -1; // -1 → reject all → miss
    },
  },
});

Stale-model eviction

Automatically evict cached entries when you upgrade the LLM for a prompt category.
On a hit, if the stored model differs from currentModel, the entry is deleted and
the call returns a miss — forcing a fresh response under the new model.

const result = await cache.check(prompt, {
  staleAfterModelChange: true,
  currentModel: 'gpt-4o',  // evict if entry was stored with gpt-3.5-turbo
});

New Prometheus counter: {prefix}_stale_model_evictions_total.

Threshold effectiveness recommendations

The cache records a rolling window of cosine distance scores (up to 10,000
entries, 7-day retention). thresholdEffectiveness() analyzes this window and
returns a concrete recommendation:

const analysis = await cache.thresholdEffectiveness({ minSamples: 100 });
// {
//   recommendation: 'tighten_threshold',
//   currentThreshold: 0.1,
//   recommendedThreshold: 0.072,
//   hitRate: 0.83,
//   uncertainHitRate: 0.31,
//   nearMissRate: 0.04,
//   reasoning: '31.0% of hits are in the uncertainty band — tighten the threshold...',
// }

// Per-category + aggregate in one call:
const allResults = await cache.th...

Contributors

KIvanow

Assets 2

24 Apr 19:29

KIvanow

semantic-cache-py-v0.1.2

4411483

Semantic Cache Python v0.1.2 Latest

Latest

betterdb-semantic-cache v0.1.0

Python port of @betterdb/semantic-cache. Embeddings-based semantic cache for AI
workloads backed by Valkey vector search — similarity matching, cost tracking,
multi-modal prompts, embedding cache, and threshold tuning, with built-in
OpenTelemetry and Prometheus instrumentation.

Requires Valkey 8+ with the valkey-search module (vector index support).
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.

Installation

pip install betterdb-semantic-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]"   # voyage / cohere / ollama embed helpers
pip install "betterdb-semantic-cache[bedrock]"  # AWS Bedrock embed helper

What's included

SemanticCache

Method	Description
`initialize()`	Create or attach to the vector index
`check(prompt)`	Similarity lookup — returns hit/miss with confidence and optional cost saved
`store(prompt, response)`	Store a response with optional cost metadata
`store_multipart(prompt, blocks)`	Store structured content blocks
`check_batch(prompts)`	Pipelined batch lookup
`invalidate(filter)`	Delete entries matching a FT.SEARCH filter
`invalidate_by_model(model)`	Delete all entries for a model
`invalidate_by_category(category)`	Delete all entries in a category
`stats()`	Hit/miss counts and cumulative cost saved
`index_info()`	Index name, doc count, vector dimension
`threshold_effectiveness()`	Rolling window analysis and threshold recommendations
`threshold_effectiveness_all()`	Per-category analysis
`flush()`	Drop index and delete all cached entries

Provider adapters

Import	Provider
`betterdb_semantic_cache.adapters.openai`	OpenAI Chat Completions
`betterdb_semantic_cache.adapters.openai_responses`	OpenAI Responses API
`betterdb_semantic_cache.adapters.anthropic`	Anthropic Messages
`betterdb_semantic_cache.adapters.llamaindex`	LlamaIndex `ChatMessage[]`
`betterdb_semantic_cache.adapters.langchain`	LangChain `BaseCache` (async-only)
`betterdb_semantic_cache.adapters.langgraph`	LangGraph `BetterDBSemanticStore`

Embedding helpers

Import	Provider
`embed.openai`	OpenAI Embeddings API
`embed.voyage`	Voyage AI (httpx, no SDK required)
`embed.cohere`	Cohere Embed v3 (httpx, no SDK required)
`embed.ollama`	Ollama local models (httpx, no SDK required)
`embed.bedrock`	AWS Bedrock Titan / Cohere (boto3)

Bundled default cost table

A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled and refreshed on every release. Cost savings tracking works out of the
box for 1,900+ models — no cost_table configuration required.

Observability

OpenTelemetry spans on every cache operation
Prometheus metrics: requests_total, similarity_score, operation_duration_seconds,
embedding_duration_seconds, cost_saved_total, embedding_cache_total,
stale_model_evictions_total

Cluster support

Pass a ValkeyCluster client and all SCAN-based operations (flush,
invalidate_by_model, invalidate_by_category) automatically iterate all master nodes.

Quick start

import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.types import CacheStoreOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed

client = valkey.Valkey(host="localhost", port=6379)
cache = SemanticCache(SemanticCacheOptions(
    client=client,
    embed_fn=create_openai_embed(),
    default_threshold=0.12,
))

async def main():
    await cache.initialize()

    result = await cache.check("What is the capital of France?")
    if result.hit:
        print("Cache hit:", result.response)
    else:
        answer = "Paris"  # ... call your LLM ...
        await cache.store(
            "What is the capital of France?", answer,
            CacheStoreOptions(model="gpt-4o", input_tokens=20, output_tokens=5),
        )

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

Assets 2

24 Apr 19:22

KIvanow

semantic-cache-py-v0.1.1

8c269b1

Semantic cache python 0.1.1

Updated release flow

Assets 2

24 Apr 19:16

KIvanow

semantic-cache-py-v0.1.0

8c269b1

betterdb-semantic-cache v0.1.0

Initial release. Full Python port of @betterdb/semantic-cache v0.2.0 — async-first,
dataclass config, feature-for-feature parity with the TypeScript implementation.

Requires Python 3.11+, Valkey 8+ with the valkey-search module.
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.

Installation

pip install betterdb-semantic-cache

Install optional extras alongside the library:

pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]"     # Voyage AI, Cohere, Ollama
pip install "betterdb-semantic-cache[bedrock]"   # AWS Bedrock
pip install "betterdb-semantic-cache[all]"       # everything above

Adapters

Six adapters extract the semantic cache key from provider-specific request objects.
All return a SemanticParams dataclass with text, blocks, and model fields.

OpenAI Chat Completions

from betterdb_semantic_cache.adapters.openai import prepare_semantic_params

params = {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
}
sp = await prepare_semantic_params(params)

result = await cache.check(sp.blocks or sp.text)
if not result.hit:
    response = await openai_client.chat.completions.create(**params)
    await cache.store(sp.blocks or sp.text, response.choices[0].message.content,
                      CacheStoreOptions(model=sp.model))

Handles text, image_url (URL and base64), input_audio, and file content parts.
Pass normalizer=cache.normalizer to share the same normalization strategy.

OpenAI Responses API

from betterdb_semantic_cache.adapters.openai_responses import prepare_semantic_params

sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)

Supports input_text, input_image, and input_file content parts.

Anthropic Messages

from betterdb_semantic_cache.adapters.anthropic import prepare_semantic_params

sp = await prepare_semantic_params(params)
result = await cache.check(sp.blocks or sp.text)

Supports text; base64, URL, and file images; base64, URL, plaintext, and file documents.

LlamaIndex

from betterdb_semantic_cache.adapters.llamaindex import prepare_semantic_params

sp = await prepare_semantic_params(messages, model="gpt-4o")
result = await cache.check(sp.text)

Extracts the last user ChatMessage from a list. Supports text, image_url,
file, audio, and image content parts.

LangChain — async `BaseCache`

BetterDBSemanticCache implements LangChain's BaseCache interface. Because
SemanticCache is async-only, the synchronous lookup() / update() methods
return None / no-op; use ainvoke / astream to get real cache behaviour.

from betterdb_semantic_cache.adapters.langchain import BetterDBSemanticCache
from langchain_openai import ChatOpenAI

lc_cache = BetterDBSemanticCache(cache)
llm = ChatOpenAI(model="gpt-4o", cache=lc_cache)

# Cache is transparent — hits are returned without calling the LLM
response = await llm.ainvoke("What is the capital of France?")

Optional filter_by_model=True scopes hits to a specific LLM configuration.

LangGraph semantic memory store

BetterDBSemanticStore implements the LangGraph BaseStore interface using
vector similarity for retrieval. Use this for agent memory (finding the most
relevant past facts for a query), not for checkpoint persistence — use
betterdb_agent_cache.adapters.langgraph for that. Both can coexist on the
same Valkey instance with different key prefixes.

from betterdb_semantic_cache.adapters.langgraph import BetterDBSemanticStore

store = BetterDBSemanticStore(cache, embed_field="content")

await store.aput(["user", "alice", "facts"], "pref_001", {
    "content": "Alice prefers async Python over synchronous code.",
})

results = await store.asearch(["user", "alice", "facts"],
                               query="What are Alice's coding preferences?",
                               limit=5)
# results[i].value — the stored dict; results[i].key — the item key

Full interface: aput(), aget(), asearch() (semantic KNN or namespace scan),
adelete(), abatch().

Embedding helpers

Five pre-built EmbedFn callables so you don't have to write your own:

Import	Provider	Default model	Dimensions
`betterdb_semantic_cache.embed.openai`	OpenAI	`text-embedding-3-small`	1536
`betterdb_semantic_cache.embed.voyage`	Voyage AI	`voyage-3-lite`	512
`betterdb_semantic_cache.embed.cohere`	Cohere	`embed-english-v3.0`	1024
`betterdb_semantic_cache.embed.ollama`	Ollama (local)	`nomic-embed-text`	768
`betterdb_semantic_cache.embed.bedrock`	AWS Bedrock	`amazon.titan-embed-text-v2:0`	1024

from betterdb_semantic_cache.embed.openai import create_openai_embed
from betterdb_semantic_cache.embed.voyage import create_voyage_embed
from betterdb_semantic_cache.embed.ollama import create_ollama_embed

cache = SemanticCache(SemanticCacheOptions(
    client=client,
    embed_fn=create_voyage_embed(model="voyage-3-lite"),
))

The Voyage AI, Cohere, and Ollama helpers use httpx directly — no provider SDK
required. The httpx client is created once per helper instance and reused across
calls. Install: pip install "betterdb-semantic-cache[httpx]".

Core features

Cost tracking + bundled model price table

Store token counts at cache time; get automatic cost-saved reporting on every hit.
A bundled DEFAULT_COST_TABLE covers 1,900+ models from
LiteLLM
and is refreshed on every release. No configuration required for common models.

await cache.store("Summarize this document", response_text,
                  CacheStoreOptions(model="gpt-4o", input_tokens=512, output_tokens=128))

result = await cache.check("Summarize this document")
print(result.cost_saved)        # e.g. 0.00385 — dollars saved on this hit

stats = await cache.stats()
print(stats.cost_saved_micros)  # cumulative across all hits

Override entries with cost_table={...}; disable with use_default_cost_table=False.

Multi-modal prompts

check(), store(), and store_multipart() accept str | list[ContentBlock].
A ContentBlock list embeds the text blocks and uses binary refs as an AND-filter —
a hit requires both semantic similarity on the text and all binary refs to match.

from betterdb_semantic_cache.normalizer import hash_base64
from betterdb_semantic_cache.utils import TextBlock, BinaryBlock

prompt = [
    TextBlock(type="text", text="What is in this image?"),
    BinaryBlock(type="binary", kind="image", mediaType="image/png",
                ref=hash_base64(b64_data)),
]

await cache.store_multipart(prompt, [TextBlock(type="text", text="A red square.")])
result = await cache.check(prompt)  # hit requires both text match AND same image
if result.hit:
    print(result.content_blocks)    # the stored ContentBlock[] response

Binary normalizer: compose_normalizer, hash_base64, hash_bytes, hash_url,
and fetch_and_hash generate stable, compact refs for any binary source. The
default_normalizer hashes base64 and bytes rather than storing raw data in TAG
fields. Access the configured normalizer via cache.normalizer.

Embedding cache

Computed embedding vectors are stored in Valkey ({name}:embed:{sha256}) and
reused on subsequent check() calls for the same text — embed_fn is only
called once per unique string.

SemanticCache(SemanticCacheOptions(
    ...,
    embedding_cache=EmbeddingCacheOptions(enabled=True, ttl=86400),  # default
))

Prometheus counter: {prefix}_embedding_cache_total labelled result: hit | miss.

Batch check — `check_batch()`

Embeds all prompts in parallel and pipelines all FT.SEARCH calls in a single
Valkey round-trip.

results = await cache.check_batch([
    "What is the capital of France?",
    "Who wrote Hamlet?",
    "What is the speed of light?",
])
# results[i] is a CacheCheckResult — same shape as check()

Rerank hook

Retrieve the top-k most similar candidates and apply custom ranking before
serving from cache.

async def pick_longest(_query: str, candidates: list[dict]) -> int:
    return max(range(len(candidates)), key=lambda i: len(candidates[i]["response"]))

result = await cache.check(query, CacheCheckOptions(
    rerank=RerankOptions(k=5, rerank_fn=pick_longest),
))

Return -1 from rerank_fn to reject all candidates (miss).

Stale-model eviction

Automatically evict cached entries when you upgrade the LLM for a prompt category.
On a hit, if the stored model differs from current_model, the entry is deleted and
the call returns a miss.

result = await cache.check(prompt, CacheCheckOptions(
    stale_after_model_change=True,
    current_model="gpt-4o",   # evict if entry was stored with gpt-3.5-turbo
))

Prometheus counter: {prefix}_stale_model_evictions_total.

Threshold effectiveness recommendations

threshold_effectiveness() analyzes a rolling window of cosine distance scores
(up to 10,000 entries, 7-day retention) and returns a concrete recommendation:

analysis = await cache.threshold_effectiveness(min_samples=100)
# ThresholdEffectivenessResult:
#   recommendation:        ...

Assets 2

23 Apr 11:15

KIvanow

agent-cache-py-v0.4.1

a8266ad

Agent Cache Python v0.4.1

betterdb-agent-cache v0.4.0

Python port of @betterdb/agent-cache. Multi-tier exact-match cache for AI agent
workloads backed by Valkey — LLM responses, tool results, and session state, with
built-in OpenTelemetry and Prometheus instrumentation.

Runs on vanilla Valkey 7+. No modules, no RedisJSON, no RediSearch. Works on
ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.

Installation

pip install betterdb-agent-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-agent-cache[openai]"
pip install "betterdb-agent-cache[anthropic]"
pip install "betterdb-agent-cache[langchain]"
pip install "betterdb-agent-cache[langgraph]"
pip install "betterdb-agent-cache[llamaindex]"

What's included

Three cache tiers

Tier	Use for
`cache.llm`	LLM API responses — `check` / `store` / `store_multipart` / `invalidate_by_model`
`cache.tool`	Tool call results — `check` / `store` / `set_policy` / `invalidate_by_tool`
`cache.session`	Agent session state — `get` / `set` / `get_all` / `destroy_thread` / `touch`

Provider adapters

Import	Provider
`betterdb_agent_cache.adapters.openai`	OpenAI Chat Completions
`betterdb_agent_cache.adapters.openai_responses`	OpenAI Responses API
`betterdb_agent_cache.adapters.anthropic`	Anthropic Messages
`betterdb_agent_cache.adapters.llamaindex`	LlamaIndex
`betterdb_agent_cache.adapters.langchain`	LangChain `BaseCache`
`betterdb_agent_cache.adapters.langgraph`	LangGraph `BaseCheckpointSaver`

Bundled default cost table

A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled with the package and refreshed on every release. Cost tracking works out of
the box for 1,900+ models — no cost_table configuration required.

User-supplied cost_table entries are merged on top of the defaults, so you can
override a single model without losing coverage for everything else:

cache = AgentCache(AgentCacheOptions(
    client=client,
    cost_table={"gpt-4o": ModelCost(input_per_1k=0.002, output_per_1k=0.008)},
))

To disable the default table entirely:

cache = AgentCache(AgentCacheOptions(
    client=client,
    use_default_cost_table=False,
    cost_table={...},
))

The bundled table is also exported directly if you need to inspect it:

from betterdb_agent_cache import DEFAULT_COST_TABLE

Pluggable binary normalizer

Controls how binary content (images, audio, documents) is reduced to a stable string
before hashing. Zero-latency by default — no network calls.

from betterdb_agent_cache import compose_normalizer, hash_base64

normalizer = compose_normalizer({"base64": hash_base64})

Observability

OpenTelemetry spans on every cache operation
Prometheus counters, histograms, and gauges: requests_total, operation_duration_seconds,
cost_saved_total, stored_bytes_total, active_sessions

Cluster support

Pass a ValkeyCluster client and all SCAN-based operations (flush, invalidate_by_model,
invalidate_by_tool, destroy_thread, touch) automatically iterate all master nodes.

Quick start

Cost tracking is pre-defined for 1,900+ models — no pricing configuration needed.

import asyncio
import valkey.asyncio as valkey_client
from betterdb_agent_cache import AgentCache, TierDefaults
from betterdb_agent_cache.adapters.openai import prepare_params
from betterdb_agent_cache.types import AgentCacheOptions

client = valkey_client.Valkey(host="localhost", port=6379)
cache = AgentCache(AgentCacheOptions(
    client=client,
    tier_defaults={"llm": TierDefaults(ttl=3600)},
    # cost_table is pre-defined for GPT-4o, Claude, Gemini, and 1,900+ others
))

async def main():
    params = await prepare_params({
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "What is 2+2?"}],
    })

    result = await cache.llm.check(params)
    if result.hit:
        print("Cache hit:", result.response)
    else:
        # ... call OpenAI ...
        await cache.llm.store(params, "Four")

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

Assets 2

22 Apr 12:38

KIvanow

v0.15.0

26679f2

v0.15.0

Vector / AI tab

A new Vector / AI tab joins the monitor sidebar when BetterDB detects valkey-search or RediSearch on a connection. No configuration required - the tab appears automatically based on capability detection and stays hidden when the module is not loaded.

The tab surfaces two things that weren't previously visible:

FT.SEARCH workload over time. The monitor now tracks search query volume and latency as a continuous time-series, persisted to storage on each poll cycle. The tab shows an ops/sec chart and an average latency chart, defaulting to the last hour and live-updating every 15 seconds. A date range picker lets you query any historical window, pausing live polling while a custom range is active.

Vector index health. Every index on the instance is listed with its doc count, record count, deleted doc count, indexing failure count, and current indexing state. Rows with failures are highlighted. Three health alerts surface as banners when conditions warrant:

Indexing failures detected on any index
An index is actively backfilling
Deleted documents are accumulating

Both Valkey with valkey-search and Redis with RediSearch are supported.

Prometheus metrics

Six new gauges are exposed via /prometheus/metrics on instances where a Search module is detected:

```
betterdb_vector_index_docs{connection, index}
betterdb_vector_index_memory_bytes{connection, index}
betterdb_vector_index_indexing_failures{connection, index}
betterdb_vector_index_percent_indexed{connection, index}

betterdb_commandstats_calls_total{connection, command}
betterdb_commandstats_latency_us{connection, command}
```

Stale labels are removed automatically when an index is dropped or a command disappears from INFO commandstats.

FT.* fixes in the commandlog

FT.SEARCH commands carry vector embeddings as binary PARAMS blocks. These were previously stored and displayed as raw bytes in the commandlog, making search commands unreadable and impossible to group by pattern. Binary and oversized arguments are now replaced with <blob>. FT.* commands are also excluded from the byKeyPrefix aggregation, which was producing meaningless patterns like idx_cache: by splitting index
names on :.

Bug fixes

Dark mode: white code blocks now render with dark text. Code blocks and <pre> elements in the Settings page (MCP token display, Claude Code config snippet) and the Webhook Deliveries panel were using bg-white without a corresponding dark text override, making them unreadable in dark mode. Fixed with dark:text-gray-900. (#119)
API: 404 returned when no connection can be resolved, rather than an empty array. Affects the commandstats history and summary endpoints when no x-connection-id header is sent and no default connection is configured.

Internal

License key included in telemetry ping events.

What's Changed

Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #115
Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #116
feature: slowlog FT.* fix, vector index health metrics, commandstats time-series by @jamby77 in #111
feature: Vector / AI monitor tab (frontend) by @jamby77 in #112
added license key to telemetry_ping events. Refactored util method location by @KIvanow in #118
feature: close commandstats AC gaps (snapshot endpoint, absolute totals, 60s) by @jamby77 in #113
feature: commandstats API endpoints + Prometheus gauges per spec by @jamby77 in #114
Agent cache py by @KIvanow in #117

Full Changelog: v0.14.2...v0.15.0

Contributors

jamby77 and KIvanow

Assets 3

23 Apr 09:30

KIvanow

agent-cache-v0.4.0

1c32e53

Agent Cache v0.4.0

Added

Bundled default cost table
- - A default cost table sourced from LiteLLM's model_prices_and_context_window.json is
    now bundled with the package and refreshed on every release
- Cost tracking works out of the box for 100+ models including GPT-4o, Claude, and Gemini — no costTable configuration required
- User-supplied costTable entries are merged on top of the defaults, allowing selective overrides without losing coverage for other models
New useDefaultCostTable option on AgentCacheOptions
- Defaults to true. Set to false to disable the bundled table entirely and supply your own
DEFAULT_COST_TABLE export
- The bundled table is now exported from the main entry point for inspection or extension
update:pricing npm script
- Regenerates defaultCostTable.ts from the latest LiteLLM pricing data

Full Changelog: v0.15.0...agent-cache-v0.4.0

Assets 2

23 Apr 10:33

KIvanow

agent-cache-py-v0.4.0

1c32e53

Agent Cache Python v0.4.0

betterdb-agent-cache v0.3.0

Runs on vanilla Valkey 7+. No modules, no RedisJSON, no RediSearch. Works on
ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.

Installation

pip install betterdb-agent-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-agent-cache[openai]"
pip install "betterdb-agent-cache[anthropic]"
pip install "betterdb-agent-cache[langchain]"
pip install "betterdb-agent-cache[langgraph]"
pip install "betterdb-agent-cache[llamaindex]"

What's included

Three cache tiers

Tier	Use for
`cache.llm`	LLM API responses — `check` / `store` / `store_multipart` / `invalidate_by_model`
`cache.tool`	Tool call results — `check` / `store` / `set_policy` / `invalidate_by_tool`
`cache.session`	Agent session state — `get` / `set` / `get_all` / `destroy_thread` / `touch`

Provider adapters

Import	Provider
`betterdb_agent_cache.adapters.openai`	OpenAI Chat Completions
`betterdb_agent_cache.adapters.openai_responses`	OpenAI Responses API
`betterdb_agent_cache.adapters.anthropic`	Anthropic Messages
`betterdb_agent_cache.adapters.llamaindex`	LlamaIndex
`betterdb_agent_cache.adapters.langchain`	LangChain `BaseCache`
`betterdb_agent_cache.adapters.langgraph`	LangGraph `BaseCheckpointSaver`

Pluggable binary normalizer

from betterdb_agent_cache import compose_normalizer, hash_base64

normalizer = compose_normalizer({"base64": hash_base64})

Observability

OpenTelemetry spans on every cache operation
Prometheus counters, histograms, and gauges (requests_total, operation_duration_seconds,
cost_saved_total, stored_bytes_total, active_sessions)

Quick start

import asyncio
import valkey.asyncio as valkey_client
from betterdb_agent_cache import AgentCache, ModelCost, TierDefaults
from betterdb_agent_cache.adapters.openai import prepare_params
from betterdb_agent_cache.types import AgentCacheOptions

client = valkey_client.Valkey(host="localhost", port=6379)
cache = AgentCache(AgentCacheOptions(
    client=client,
    tier_defaults={"llm": TierDefaults(ttl=3600)},
    cost_table={"gpt-4o-mini": ModelCost(input_per_1k=0.00015, output_per_1k=0.0006)},
))

async def main():
    params = await prepare_params({
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "What is 2+2?"}],
    })

    result = await cache.llm.check(params)
    if result.hit:
        print("Cache hit:", result.response)
    else:
        # ... call OpenAI ...
        await cache.llm.store(params, "Four")

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

Assets 2

20 Apr 19:08

KIvanow

agent-cache-v0.3.0

e74fbf0

Agent Cache v0.3.0

Multi-modal support and full provider adapter coverage for OpenAI, Anthropic, and LlamaIndex.

What's new

New provider adapters

Four new sub-path imports covering every major provider:

Import	Provider
`@betterdb/agent-cache/openai`	OpenAI Chat Completions
`@betterdb/agent-cache/openai-responses`	OpenAI Responses API
`@betterdb/agent-cache/anthropic`	Anthropic Messages
`@betterdb/agent-cache/llamaindex`	LlamaIndex

Each adapter converts provider-native request params into the canonical LlmCacheParams format — no manual normalisation required.

OpenAI Chat handles text, images (URL + base64), audio, files, tool calls, and legacy function role messages.

OpenAI Responses additionally covers reasoning items, function_call / function_call_output item types, and instructions promoted to a system message.

Anthropic covers tool use blocks, tool result blocks, and thinking / extended thinking blocks alongside standard image sources.

LlamaIndex wraps ChatMessage history including text and image nodes.

Pluggable binary normalizer

Binary content (images, audio, documents) is now a first-class part of the cache key pipeline. The new BinaryNormalizer interface lets you control how blobs are reduced to a stable string before hashing.

import { composeNormalizer } from '@betterdb/agent-cache';

const normalizer = composeNormalizer({
  // Hash base64 payloads by their decoded bytes
  base64: (data) => hashBase64(data),
  // Fetch remote images and hash the response body
  url: (url) => fetchAndHash(url),
  // Use OpenAI file IDs directly as cache keys
  fileId: (id, provider) => `${provider}:${id}`,
});

Built-in helpers:

Helper	Behaviour
`hashBase64(data)`	SHA-256 of decoded bytes
`hashBytes(data)`	SHA-256 of raw bytes
`hashUrl(url)`	Normalised URL (sorted query params, lowercased host)
`fetchAndHash(url)`	Fetches URL and SHA-256s the body
`passthrough(ref)`	Scheme-prefixed ref, no transformation
`composeNormalizer(cfg)`	Build a normalizer from per-source / per-kind handlers

The defaultNormalizer uses passthrough — zero-latency, no network calls, suitable for most use cases.

Extended cache key coverage

LlmCacheParams now includes all parameters that affect model output:

toolChoice, seed, stop, responseFormat
reasoningEffort — for extended thinking models
promptCacheKey — pass-through for provider-level prompt caching

New examples

Runnable examples added for all three new providers:

examples/openai/
examples/anthropic/
examples/llamaindex/

Bug fixes

Null tool output (openai-responses): function_call_output items with null or undefined output now produce an empty string instead of the two-character literal "", which was corrupting cache key hashes.

Installation

npm install @betterdb/agent-cache@0.3.0

Full changelog

See CHANGELOG.md for a complete list of changes.

What's Changed

Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #115
Feature/agent cache adding OpenAI anthropic llamaindex adapters by @KIvanow in #116

Full Changelog: v0.14.2...agent-cache-v0.3.0

Contributors

KIvanow

Assets 2

17 Apr 06:43

KIvanow

v0.14.2

200435c

v0.14.2

Security updates

`fastify` 5.8.4 → 5.8.5

Fixes CVE-2026-33806 — a security vulnerability in the Fastify HTTP framework. See GHSA-247c-9743-5963.

`@fastify/static` 9.0.0 → 9.1.1

Fixes two CVEs in static file serving:

CVE-2026-6410 (GHSA-pr96-94w5-mx2h)
CVE-2026-6414 (GHSA-x428-ghpx-8j92)

`vite` 8.0.2 → 8.0.5

Multiple path traversal and filesystem boundary bypass fixes in the dev server:

server.fs checks now apply to env transport requests and query-stripped paths
Sourcemap handlers no longer allow referencing files outside the package root

Bug fixes

Vector Search

Fixed crashes caused by out-of-bounds data
Fixed Find Similar button being difficult to click
Fixed graph labels being unreadable and not respecting the active color scheme

What's Changed

added licenseKey to posthog events by @KIvanow in #104
Add @betterdb/agent-cache package — multi-tier LLM/tool/session cache with framework adapters by @KIvanow in #105
cluster support for agent-cache by @KIvanow in #108
build(deps-dev): bump vite from 8.0.2 to 8.0.5 by @dependabot[bot] in #99
build(deps): bump @nestjs/core from 11.1.17 to 11.1.18 by @dependabot[bot] in #100
build(deps): bump fastify from 5.8.4 to 5.8.5 by @dependabot[bot] in #106
build(deps): bump @fastify/static from 9.0.0 to 9.1.1 by @dependabot[bot] in #110

Full Changelog: v0.14.1...v0.14.2

Contributors

KIvanow and dependabot

Assets 3

Releases: BetterDB-inc/monitor

Semantic Cache v0.2.0

Release Notes — @betterdb/semantic-cache v0.2.0

Installation

New adapters

OpenAI Chat Completions — @betterdb/semantic-cache/openai

OpenAI Responses API — @betterdb/semantic-cache/openai-responses

Anthropic Messages — @betterdb/semantic-cache/anthropic

LlamaIndex — @betterdb/semantic-cache/llamaindex

LangGraph semantic memory store — @betterdb/semantic-cache/langgraph

Updated: LangChain — @betterdb/semantic-cache/langchain

Embedding helpers

New core features

Cost tracking + bundled model price table

Multi-modal prompts

Embedding cache

Batch check — checkBatch()

Rerank hook

Stale-model eviction

Threshold effectiveness recommendations

Contributors

Uh oh!

Semantic Cache Python v0.1.2

betterdb-semantic-cache v0.1.0

Installation

What's included

SemanticCache

Provider adapters

Embedding helpers

Bundled default cost table

Observability

Cluster support

Quick start

Full changelog

Uh oh!

Semantic cache python 0.1.1

Uh oh!

betterdb-semantic-cache v0.1.0

betterdb-semantic-cache v0.1.0

Installation

Adapters

OpenAI Chat Completions

OpenAI Responses API

Anthropic Messages

LlamaIndex

LangChain — async BaseCache

LangGraph semantic memory store

Embedding helpers

Core features

Cost tracking + bundled model price table

Multi-modal prompts

Embedding cache

Batch check — check_batch()

Rerank hook

Stale-model eviction

Threshold effectiveness recommendations

Uh oh!

Agent Cache Python v0.4.1

betterdb-agent-cache v0.4.0

Installation

What's included

Three cache tiers

Provider adapters

Bundled default cost table

Pluggable binary normalizer

Observability

Cluster support

Quick start

Full changelog

Uh oh!

v0.15.0

Vector / AI tab

Prometheus metrics

FT.* fixes in the commandlog

Bug fixes

Internal

What's Changed

Contributors

Uh oh!

Agent Cache v0.4.0

OpenAI Chat Completions — `@betterdb/semantic-cache/openai`

OpenAI Responses API — `@betterdb/semantic-cache/openai-responses`

Anthropic Messages — `@betterdb/semantic-cache/anthropic`

LlamaIndex — `@betterdb/semantic-cache/llamaindex`

LangGraph semantic memory store — `@betterdb/semantic-cache/langgraph`

Updated: LangChain — `@betterdb/semantic-cache/langchain`

Batch check — `checkBatch()`

LangChain — async `BaseCache`

Batch check — `check_batch()`