Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,19 @@ All notable changes to this project are documented here. For detailed informatio
- Can be combined with `max_usd`: `budget(max_usd=1.00, max_llm_calls=20)`
- Works with fallback: `budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"})`

**LiteLLM provider adapter**

- Install with `pip install shekel[litellm]`
- Patches `litellm.completion` and `litellm.acompletion` (sync + async, including streaming)
- Tracks costs across all 100+ providers LiteLLM supports (Gemini, Cohere, Ollama, Azure, Bedrock, Mistral, and more)
- Model names with provider prefix (e.g. `gemini/gemini-1.5-flash`) pass through to the pricing engine

**LangGraph integration helper**

- `from shekel.integrations.langgraph import budgeted_graph`
- `budgeted_graph(max_usd, **kwargs)` — convenience context manager wrapping `budget()` for LangGraph workflows
- Install with `pip install shekel[langgraph]`

## [0.2.5] - 2026-03-11

### 🔧 Extensible Provider Architecture
Expand Down
2 changes: 1 addition & 1 deletion docs/extending.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ assert cost == 0.005 # (1000/1000 * 0.002) + (500/1000 * 0.006)

## Supporting New LLM Providers

Shekel uses a pluggable `ProviderAdapter` pattern. To add support for a new provider (e.g., Cohere, Mistral), implement `ProviderAdapter` and register it — no changes to core Shekel code required.
Shekel uses a pluggable `ProviderAdapter` pattern. Built-in adapters cover **OpenAI**, **Anthropic**, and **LiteLLM** (which in turn routes to 100+ providers). To add support for a provider not covered by LiteLLM (e.g., a proprietary API or a very new SDK), implement `ProviderAdapter` and register it — no changes to core Shekel code required.

### The ProviderAdapter Interface

Expand Down
8 changes: 7 additions & 1 deletion docs/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ with budget(max_usd=1.00):
- `anthropic.resources.messages.Messages.create` (sync)
- `anthropic.resources.messages.AsyncMessages.create` (async)

**LiteLLM** (when `shekel[litellm]` is installed):
- `litellm.completion` (sync)
- `litellm.acompletion` (async)

### Patching Implementation

Shekel uses a pluggable `ProviderAdapter` pattern — each provider registers itself in `ADAPTER_REGISTRY`. `shekel/_patch.py` delegates all patching to the registry:
Expand Down Expand Up @@ -153,7 +157,7 @@ with budget(max_usd=5.00):

Shekel extracts tokens from API responses:

**OpenAI:**
**OpenAI / LiteLLM:**
```python
def _extract_openai_tokens(response):
input_tokens = response.usage.prompt_tokens
Expand All @@ -162,6 +166,8 @@ def _extract_openai_tokens(response):
return input_tokens, output_tokens, model
```

LiteLLM uses the same OpenAI-compatible format regardless of the underlying provider, so the same extraction logic applies.

**Anthropic:**
```python
def _extract_anthropic_tokens(response):
Expand Down
34 changes: 32 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ I built shekel so you don't have to learn that lesson yourself.

---

Works with LangGraph, CrewAI, AutoGen, LlamaIndex, Haystack, and any framework that calls OpenAI or Anthropic.
Works with LangGraph, CrewAI, AutoGen, LlamaIndex, Haystack, and any framework that calls OpenAI, Anthropic, or LiteLLM.

- :material-web:{ .lg .middle } **Async & Streaming**

Expand Down Expand Up @@ -144,6 +144,12 @@ I built shekel so you don't have to learn that lesson yourself.
pip install shekel[anthropic]
```

=== "LiteLLM (100+ providers)"

```bash
pip install shekel[litellm]
```

=== "Both"

```bash
Expand Down Expand Up @@ -217,7 +223,7 @@ print(f"Remaining: ${b.remaining:.4f}")

## What's New in v0.2.6

**Breaking-Change Release** — Cleaner API with dict-based fallback, renamed callbacks, removed deprecated parameters, and new call-count budgets.
**Breaking-Change Release** — Cleaner API with dict-based fallback, renamed callbacks, removed deprecated parameters, new call-count budgets, LiteLLM support, and a LangGraph convenience helper.

<div class="grid cards" markdown>

Expand All @@ -239,6 +245,28 @@ print(f"Remaining: ${b.remaining:.4f}")

New `max_llm_calls` parameter limits by number of LLM API calls, combinable with `max_usd`.

- :material-transit-connection-variant:{ .lg .middle } **[LiteLLM Support](integrations/litellm.md)**

---

Native adapter for LiteLLM — track costs across 100+ providers (Gemini, Cohere, Ollama, Azure, Bedrock…) with zero extra code.

```python
pip install shekel[litellm]
```

- :material-graph:{ .lg .middle } **[LangGraph Helper](integrations/langgraph.md)**

---

New `budgeted_graph()` context manager for cleaner LangGraph integration.

```python
from shekel.integrations.langgraph import budgeted_graph
with budgeted_graph(max_usd=0.50) as b:
result = app.invoke(state)
```

</div>

---
Expand Down Expand Up @@ -279,6 +307,8 @@ print(f"Remaining: ${b.remaining:.4f}")

Built-in pricing for GPT-4o, GPT-4o-mini, o1, Claude 3.5 Sonnet, Claude 3 Haiku, Gemini 1.5, and more.

Install `shekel[litellm]` to track costs across 100+ providers through LiteLLM's unified interface.

Install `shekel[all-models]` for 400+ models via [tokencost](https://github.com/AgentOps-AI/tokencost).

[See full model list →](models.md)
Expand Down
22 changes: 20 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
## Requirements

- Python 3.9 or higher
- OpenAI SDK (optional) - for OpenAI models
- Anthropic SDK (optional) - for Anthropic models
- OpenAI SDK (optional) — for OpenAI models
- Anthropic SDK (optional) — for Anthropic models
- LiteLLM (optional) — for 100+ providers via a unified interface

## Install Shekel

Expand Down Expand Up @@ -34,6 +35,14 @@ If you're using models from both providers:
pip install shekel[all]
```

### LiteLLM (100+ Providers)

For access to OpenAI, Anthropic, Gemini, Cohere, Ollama, Azure, Bedrock, and 90+ more through a unified interface:

```bash
pip install shekel[litellm]
```

### Extended Model Support (400+ Models)

For support of 400+ models via [tokencost](https://github.com/AgentOps-AI/tokencost):
Expand Down Expand Up @@ -97,6 +106,7 @@ Shekel has zero required dependencies beyond the Python standard library. The Op
|---------|-----------|---------|
| `openai>=1.0.0` | Optional | Track OpenAI API costs |
| `anthropic>=0.7.0` | Optional | Track Anthropic API costs |
| `litellm>=1.0.0` | Optional | Track costs via LiteLLM (100+ providers) |
| `tokencost>=0.1.0` | Optional | Support 400+ models |
| `click>=8.0.0` | Optional | CLI tools |

Expand All @@ -118,6 +128,14 @@ If you see this error, install the Anthropic SDK:
pip install shekel[anthropic]
```

### ImportError: No module named 'litellm'

If you see this error, install LiteLLM:

```bash
pip install shekel[litellm]
```

### Model pricing not found

For models not in shekel's built-in pricing table:
Expand Down
19 changes: 18 additions & 1 deletion docs/integrations/langgraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,26 @@ Shekel works seamlessly with [LangGraph](https://github.com/langchain-ai/langgra
pip install shekel[openai] "langgraph>=0.2"
```

## Convenience Helper

Shekel provides a `budgeted_graph()` context manager so you don't need to import `budget` directly:

```python
from shekel.integrations.langgraph import budgeted_graph

app = graph.compile()

with budgeted_graph(max_usd=0.50, name="research-graph") as b:
result = app.invoke({"question": "What is 2+2?", "answer": ""})
print(f"Answer: {result['answer']}")
print(f"Cost: ${b.spent:.4f}")
```

It accepts the same keyword arguments as `budget()` (`name`, `warn_at`, `fallback`, `max_llm_calls`, etc.) and yields the active budget object.

## Basic Integration

Wrap your LangGraph execution with a budget context:
You can also use `budget()` directly — they are equivalent:

```python
from langgraph.graph import StateGraph, END
Expand Down
179 changes: 179 additions & 0 deletions docs/integrations/litellm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# LiteLLM Integration

Shekel natively supports [LiteLLM](https://github.com/BerriAI/litellm), the unified gateway that routes to 100+ LLM providers using an OpenAI-compatible interface.

## Installation

```bash
pip install shekel[litellm]
```

Or alongside other extras:

```bash
pip install "shekel[litellm,langfuse]"
```

## Basic Usage

Wrap any `litellm.completion` call in a budget context — no other changes needed:

```python
import litellm
from shekel import budget

with budget(max_usd=0.50) as b:
response = litellm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

print(f"Cost: ${b.spent:.4f}")
```

## Why LiteLLM + Shekel?

LiteLLM routes to OpenAI, Anthropic, Gemini, Cohere, Ollama, Azure, Bedrock, and 90+ more. Shekel tracks the cost of every call regardless of which provider LiteLLM routes to.

```python
import litellm
from shekel import budget

with budget(max_usd=2.00) as b:
# OpenAI
litellm.completion(model="gpt-4o-mini", messages=[...])
# Anthropic
litellm.completion(model="claude-3-haiku-20240307", messages=[...])
# Google Gemini
litellm.completion(model="gemini/gemini-1.5-flash", messages=[...])

print(f"Combined cost across providers: ${b.spent:.4f}")
```

## Async Support

```python
import asyncio
import litellm
from shekel import budget

async def run():
async with budget(max_usd=1.00) as b:
response = await litellm.acompletion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello async!"}],
)
print(response.choices[0].message.content)
print(f"Cost: ${b.spent:.4f}")

asyncio.run(run())
```

## Streaming

```python
import litellm
from shekel import budget

with budget(max_usd=0.50) as b:
stream = litellm.completion(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)

print(f"\nStreaming cost: ${b.spent:.4f}")
```

## Budget Enforcement

Hard cap and early warnings work exactly as with any other provider:

```python
from shekel import budget, BudgetExceededError

try:
with budget(max_usd=0.10, warn_at=0.8) as b:
for i in range(100):
litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": f"Question {i}"}],
)
except BudgetExceededError as e:
print(f"Stopped at ${e.spent:.4f} after {b.call_count} calls")
```

## Fallback Models

Switch to a cheaper LiteLLM-routed model when budget runs low:

```python
import litellm
from shekel import budget

with budget(
max_usd=1.00,
fallback={"at_pct": 0.8, "model": "gpt-4o-mini"},
) as b:
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)

if b.model_switched:
print(f"Switched to {b.fallback['model']} at ${b.switched_at_usd:.4f}")
```

## How It Works

Shekel's `LiteLLMAdapter` patches the `litellm.completion` and `litellm.acompletion` module-level functions when the first `budget()` context is entered, and restores them when the last one exits.

LiteLLM returns responses in OpenAI-compatible format (`response.usage.prompt_tokens`, `response.usage.completion_tokens`), so token extraction is straightforward regardless of which underlying provider was used.

Model names may include a provider prefix (e.g. `gemini/gemini-1.5-flash`, `anthropic/claude-3-haiku-20240307`). Shekel passes these through to its pricing engine, which falls back to [tokencost](https://github.com/AgentOps-AI/tokencost) for extended model coverage.

## Extended Model Pricing

For accurate pricing on the full range of providers LiteLLM supports:

```bash
pip install "shekel[litellm,all-models]"
```

This installs `tokencost`, which covers 400+ models including Gemini, Cohere, Mistral, and many more.

## With LangGraph or CrewAI

LiteLLM can serve as the LLM backend for agent frameworks. Shekel tracks costs regardless:

```python
from langgraph.graph import StateGraph, END
import litellm
from shekel import budget

def call_litellm(state):
response = litellm.completion(
model="gemini/gemini-1.5-flash",
messages=[{"role": "user", "content": state["question"]}],
)
return {"answer": response.choices[0].message.content}

graph = StateGraph({"question": str, "answer": str})
graph.add_node("llm", call_litellm)
graph.set_entry_point("llm")
graph.add_edge("llm", END)
app = graph.compile()

with budget(max_usd=0.50) as b:
result = app.invoke({"question": "What is 2+2?", "answer": ""})
print(f"Cost: ${b.spent:.4f}")
```

## Next Steps

- [LangGraph Integration](langgraph.md)
- [Extending Shekel](../extending.md) — add your own provider adapter
- [Supported Models](../models.md)
Loading
Loading