diff --git a/CLAUDE.md b/CLAUDE.md
index cd489e0..64a2931 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -32,7 +32,7 @@ mintlify install
| Directory | Purpose |
|-----------|---------|
-| `tracing/` | Monitoring & tracing guides, SDK docs (Python + TypeScript), advanced topics (sessions, tagging, signals, OTel) |
+| `tracing/` | Monitoring & tracing guides, SDK docs (Python + TypeScript), advanced topics (sessions, tagging, OTel) |
| `autotune/` | Prompt optimization ("Prompts" in nav), setup, model configs |
| `judges/` | AI evaluation judges, setup, multimodal eval, feedback submission |
| `evaluations/` | Evaluations section (currently placeholder) |
diff --git a/autotune/introduction.mdx b/autotune/introduction.mdx
index c10a81e..3bc41d9 100644
--- a/autotune/introduction.mdx
+++ b/autotune/introduction.mdx
@@ -1,36 +1,47 @@
---
title: "Introduction"
-description: "Run evaluations on models and prompts to find the best variants for your agents"
+description: "Version, track, and optimize every prompt your agent uses"
---
-Prompt optimization is a different approach to the traditional evals experience. Instead of setting up complex eval pipelines, we simply ingest your production traces and let you optimize your prompts based on your feedback.
+Prompts are the instructions that drive your agent's behavior. Small changes in wording can dramatically affect output quality, but without tracking, you have no way to know which version works best -- or even which version is running in production.
+
+ZeroEval Prompts gives you version control for prompts with a single function call. Every change is tracked, every completion is linked to the exact prompt version that produced it, and you can deploy optimized versions without touching code.
+
+## Why track prompts
+
+- **Version history** -- every prompt change creates a new version you can compare and roll back to
+- **Production visibility** -- see exactly which prompt version is running, how often it's called, and what it produces
+- **Feedback loop** -- attach thumbs-up/down feedback to completions, then use it to [optimize prompts](/autotune/prompts/prompts) and [evaluate models](/judges/introduction)
+- **One-click deployments** -- push a winning prompt or model to production without redeploying your app
## How it works
-
- Replace hardcoded prompts with `ze.prompt()` calls in Python or `ze.prompt({...})` in TypeScript
+
+ Swap string literals for `ze.prompt()` calls. Your existing prompt text
+ becomes the fallback content.
-
- Each time you modify your prompt content, a new version is automatically created and tracked
+
+ Each unique prompt string creates a tracked version. Changes in your code
+ produce new versions without any extra work.
-
- ZeroEval automatically tracks all LLM interactions and their outcomes
+
+ When your LLM integration fires, ZeroEval links each completion to the exact
+ prompt version and model that produced it.
-
- Use the UI to run experiments, vote on outputs, and identify the best prompt/model combinations
-
-
- Winning configurations are automatically deployed to your application without code changes
+
+ Review completions, submit feedback, and generate improved prompt variants
+ -- all from real traffic.
+## Get started
+
-
- Learn how to integrate ze.prompt() into your Python or TypeScript codebase
+
+ `ze.prompt()` and `ze.get_prompt()` for Python applications
-
- Run experiments and deploy winning combinations
+
+ `ze.prompt()` for TypeScript and JavaScript applications
-
diff --git a/autotune/prompts/models.mdx b/autotune/prompts/models.mdx
deleted file mode 100644
index 4fa58a3..0000000
--- a/autotune/prompts/models.mdx
+++ /dev/null
@@ -1,10 +0,0 @@
----
-title: "Models"
-description: "Evaluate your agent's performance across multiple models"
----
-
-
-
-ZeroEval lets you evaluate real production traces of specific agent tasks across different models, then ranking them over time. This helps you pick the best model for each part of your agent.
-
-
diff --git a/autotune/prompts/prompts.mdx b/autotune/prompts/prompts.mdx
index 49bd00c..0c4fb6c 100644
--- a/autotune/prompts/prompts.mdx
+++ b/autotune/prompts/prompts.mdx
@@ -5,7 +5,7 @@ description: "Use feedback on production traces to generate and validate better
-ZeroEval derives prompt optimization suggestions directly from feedback on your production traces. By capturing preferences and correctness signals, we provide concrete prompt edits you can test and use for your agents.
+ZeroEval derives prompt optimization suggestions directly from feedback on your production traces. By capturing preferences and corrections, we provide concrete prompt edits you can test and use for your agents.
## Submitting Feedback
diff --git a/autotune/reference.mdx b/autotune/reference.mdx
index 618f786..0148050 100644
--- a/autotune/reference.mdx
+++ b/autotune/reference.mdx
@@ -1,89 +1,250 @@
---
-title: "Reference"
-description: "Parameters and configuration for ze.prompt"
+title: "API Reference"
+description: "REST API for managing prompts, versions, and deployments"
---
-`ze.prompt` creates or fetches versioned prompts from the Prompt Library and returns decorated content for downstream LLM calls.
+Base URL: `https://api.zeroeval.com`
-
-**TypeScript differences**: In TypeScript, `ze.prompt()` is an async function that returns `Promise`. Parameters use camelCase and are passed as an options object: `ze.prompt({ name: "...", content: "..." })`.
-
+All requests require a Bearer token:
-## Parameters
+```
+Authorization: Bearer YOUR_ZEROEVAL_API_KEY
+```
-| Python | TypeScript | Type | Required | Default | Description |
-| --- | --- | --- | --- | --- | --- |
-| `name` | `name` | string | yes | — | Task name associated with the prompt in the library |
-| `content` | `content` | string | no | `None`/`undefined` | Raw prompt content to ensure/create a version by content |
-| `from_` | `from` | string | no | `None`/`undefined` | Either `"latest"`, `"explicit"`, or a 64‑char SHA‑256 hash |
-| `variables` | `variables` | dict/object | no | `None`/`undefined` | Template variables to render `{{variable}}` tokens |
+---
-Notes:
+## Get Prompt
-- In Python, use `from_` (with underscore) as `from` is a reserved keyword. TypeScript uses `from` directly.
-- Exactly one of `content` or `from` must be provided (except when using `from: "explicit"` with `content`).
-- `from="latest"` fetches the latest version bound to the task; otherwise `from` must be a 64‑char hex SHA‑256 hash.
+```
+GET /v1/prompts/{prompt_slug}
+```
-## Behavior
+Fetch the current version of a prompt by its slug.
-- **content provided**: Computes a normalized SHA‑256 hash, ensures a prompt version exists for `name`, and returns decorated content.
-- **from="latest"**: Fetches the latest version for `name` and returns decorated content.
-- **from=**``: Fetches by content hash for `name` and returns decorated content.
+| Query Parameter | Type | Default | Description |
+| --------------- | -------- | ---------- | ----------------------------------------------- |
+| `version` | `int` | — | Fetch a specific version number |
+| `tag` | `string` | `"latest"` | Tag to fetch (`"production"`, `"latest"`, etc.) |
-Decoration adds a compact metadata header used by integrations:
+```bash
+curl https://api.zeroeval.com/v1/prompts/support-bot \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY"
+```
-- `task`, `prompt_slug`, `prompt_version`, `prompt_version_id`, `variables`, and (when created by content) `content_hash`.
+**Response:** 200
+
+```json
+{
+ "id": "a1b2c3d4-...",
+ "prompt_id": "b2c3d4e5-...",
+ "content": "You are a helpful customer support agent.",
+ "content_hash": "e3b0c44298fc...",
+ "version": 3,
+ "model_id": "gpt-4o",
+ "tag": "production",
+ "is_latest": true,
+ "metadata": {},
+ "created_at": "2025-01-15T10:30:00Z"
+}
+```
-OpenAI integration: when `prompt_version_id` is present, the SDK will automatically patch the `model` parameter to the model bound to that prompt version.
+### Fetch by tag
-## Return Value
+```bash
+curl "https://api.zeroeval.com/v1/prompts/support-bot?tag=production" \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY"
+```
-- **Python**: `str` - Decorated prompt content ready to pass into LLM clients.
-- **TypeScript**: `Promise` - Async function returning decorated prompt content.
+### Fetch by version number
-## Errors
+```bash
+curl "https://api.zeroeval.com/v1/prompts/support-bot?version=2" \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY"
+```
-| Python | TypeScript | When |
-| --- | --- | --- |
-| `ValueError` | `Error` | Both `content` and `from` provided (except explicit), or neither; invalid `from` value |
-| `PromptRequestError` | `PromptRequestError` | `from="latest"` but no versions exist for `name` |
-| `PromptNotFoundError` | `PromptNotFoundError` | `from` is a hash that does not exist for `name` |
+---
-## Examples
+## Ensure Prompt Version
-
-```python Python
-import zeroeval as ze
+```
+POST /v1/tasks/{task_name}/prompt/versions/ensure
+```
+
+Create a prompt version if it doesn't already exist (idempotent by content hash). This is what `ze.prompt()` calls under the hood.
-# Create/ensure a version by content
-system = ze.prompt(
- name="support-triage",
- content="You are a helpful assistant for {{product}}.",
- variables={"product": "Acme"},
-)
+**Request body:**
-# Fetch the latest version for this task
-system = ze.prompt(name="support-triage", from_="latest")
+| Field | Type | Required | Description |
+| -------------- | -------- | -------- | ---------------------------------------------- |
+| `content` | `string` | Yes | Prompt content |
+| `content_hash` | `string` | No | SHA-256 hash (computed server-side if omitted) |
+| `model_id` | `string` | No | Model to bind to this version |
+| `metadata` | `object` | No | Additional metadata |
-# Fetch a specific version by content hash
-system = ze.prompt(name="support-triage", from_="c6a7...deadbeef...0123")
+```bash
+curl -X POST https://api.zeroeval.com/v1/tasks/support-bot/prompt/versions/ensure \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "content": "You are a helpful customer support agent for {{company}}."
+ }'
```
-```typescript TypeScript
-import * as ze from 'zeroeval';
-// Create/ensure a version by content
-const system = await ze.prompt({
- name: "support-triage",
- content: "You are a helpful assistant for {{product}}.",
- variables: { product: "Acme" },
-});
+**Response:** 200
+
+```json
+{
+ "id": "c3d4e5f6-...",
+ "content": "You are a helpful customer support agent for {{company}}.",
+ "content_hash": "a1b2c3d4...",
+ "version": 1,
+ "model_id": null,
+ "created_at": "2025-01-15T10:30:00Z"
+}
+```
+
+---
-// Fetch the latest version for this task
-const system = await ze.prompt({ name: "support-triage", from: "latest" });
+## Get Version by Hash
-// Fetch a specific version by content hash
-const system = await ze.prompt({ name: "support-triage", from: "c6a7...deadbeef...0123" });
```
-
+GET /v1/tasks/{task_name}/prompt/versions/by-hash/{content_hash}
+```
+
+Fetch a specific prompt version by its SHA-256 content hash.
+
+**Response:** 200 (same schema as ensure)
+
+---
+
+## Get Latest Version
+
+```
+GET /v1/tasks/{task_name}/prompt/latest
+```
+
+Fetch the latest prompt version for a task.
+
+**Response:** 200 (same schema as ensure)
+
+---
+
+## Resolve Model for Version
+
+```
+GET /v1/prompt-versions/{version_id}/model
+```
+
+Get the model bound to a specific prompt version. Used by SDK integrations to auto-patch the `model` parameter.
+
+**Response:** 200
+
+```json
+{
+ "model_id": "gpt-4o",
+ "provider": "openai"
+}
+```
+
+Returns `null` for `model_id` if no model is bound.
+
+---
+
+## Deploy a Version (Pin Tag)
+```
+POST /projects/{project_id}/prompts/{prompt_slug}/tags/{tag}:pin
+```
+
+Pin a tag (e.g. `production`) to a specific version number. This is how you deploy a prompt version to production.
+
+**Request body:**
+
+| Field | Type | Required | Description |
+| --------- | ----- | -------- | --------------------- |
+| `version` | `int` | Yes | Version number to pin |
+
+```bash
+curl -X POST https://api.zeroeval.com/projects/$PROJECT_ID/prompts/support-bot/tags/production:pin \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{"version": 3}'
+```
+
+---
+
+## List Versions
+
+```
+GET /projects/{project_id}/prompts/{prompt_slug}/versions
+```
+
+List all versions of a prompt.
+
+**Response:** 200
+
+```json
+[
+ {
+ "id": "c3d4e5f6-...",
+ "content": "You are a helpful assistant.",
+ "content_hash": "a1b2c3d4...",
+ "version": 1,
+ "model_id": null,
+ "created_at": "2025-01-10T10:00:00Z"
+ },
+ {
+ "id": "d4e5f6a7-...",
+ "content": "You are a helpful customer support agent.",
+ "content_hash": "b2c3d4e5...",
+ "version": 2,
+ "model_id": "gpt-4o",
+ "created_at": "2025-01-15T10:30:00Z"
+ }
+]
+```
+
+---
+
+## List Tags
+
+```
+GET /projects/{project_id}/prompts/{prompt_slug}/tags
+```
+
+List all tags and which version each is pinned to.
+
+**Response:** 200
+
+```json
+[
+ { "tag": "latest", "version": 2 },
+ { "tag": "production", "version": 1 }
+]
+```
+
+---
+
+## Update Version Model
+
+```
+PATCH /projects/{project_id}/prompts/{prompt_slug}/versions/{version}
+```
+
+Update the model bound to a version.
+
+**Request body:**
+
+| Field | Type | Description |
+| ---------- | -------- | ------------------------ |
+| `model_id` | `string` | Model identifier to bind |
+
+---
+
+## Submit Completion Feedback
+
+```
+POST /v1/prompts/{prompt_slug}/completions/{completion_id}/feedback
+```
+See [Feedback API Reference](/feedback/api-reference#completion-feedback) for the full specification.
diff --git a/autotune/sdks/python.mdx b/autotune/sdks/python.mdx
new file mode 100644
index 0000000..ea7bc62
--- /dev/null
+++ b/autotune/sdks/python.mdx
@@ -0,0 +1,178 @@
+---
+title: "Python"
+description: "Track and version prompts in Python with ze.prompt()"
+---
+
+## Installation
+
+```bash
+pip install zeroeval
+```
+
+## Basic Setup
+
+Replace hardcoded prompt strings with `ze.prompt()`. Your existing text becomes the fallback content that's used until an optimized version is available.
+
+```python
+import zeroeval as ze
+from openai import OpenAI
+
+ze.init()
+client = OpenAI()
+
+system_prompt = ze.prompt(
+ name="support-bot",
+ content="You are a helpful customer support agent for {{company}}.",
+ variables={"company": "TechCorp"}
+)
+
+response = client.chat.completions.create(
+ model="gpt-4",
+ messages=[
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": "How do I reset my password?"}
+ ]
+)
+```
+
+That's it. Every call to `ze.prompt()` is tracked, versioned, and linked to the completions it produces. You'll see production traces at [ZeroEval → Prompts](https://app.zeroeval.com).
+
+
+ When you provide `content`, ZeroEval automatically uses the latest optimized
+ version from your dashboard if one exists. The `content` parameter serves as a
+ fallback for when no optimized versions are available yet.
+
+
+## Version Control
+
+### Auto-optimization (default)
+
+```python
+prompt = ze.prompt(
+ name="customer-support",
+ content="You are a helpful assistant."
+)
+```
+
+Uses the latest optimized version if one exists, otherwise falls back to the provided content.
+
+### Explicit mode
+
+```python
+prompt = ze.prompt(
+ name="customer-support",
+ from_="explicit",
+ content="You are a helpful assistant."
+)
+```
+
+Always uses the provided content. Useful for debugging or A/B testing a specific version.
+
+### Latest mode
+
+```python
+prompt = ze.prompt(
+ name="customer-support",
+ from_="latest"
+)
+```
+
+Requires an optimized version to exist. Fails with `PromptRequestError` if none is found.
+
+### Pin to a specific version
+
+```python
+prompt = ze.prompt(
+ name="customer-support",
+ from_="a1b2c3d4..." # 64-char SHA-256 hash
+)
+```
+
+## Prompt Library
+
+For more control, use `ze.get_prompt()` to fetch prompts from the Prompt Library with tag-based deployments and caching.
+
+```python
+prompt = ze.get_prompt(
+ "support-triage",
+ tag="production",
+ fallback="You are a helpful assistant.",
+ variables={"product": "Acme"},
+)
+
+print(prompt.content)
+print(prompt.version)
+print(prompt.model)
+```
+
+### Parameters
+
+| Parameter | Type | Default | Description |
+| ----------- | ------- | ---------- | ---------------------------------------------------------- |
+| `slug` | `str` | — | Prompt slug (e.g. `"support-triage"`) |
+| `version` | `int` | `None` | Fetch a specific version number |
+| `tag` | `str` | `"latest"` | Tag to fetch (`"production"`, `"latest"`, etc.) |
+| `fallback` | `str` | `None` | Content to use if the prompt is not found |
+| `variables` | `dict` | `None` | Template variables for `{{var}}` tokens |
+| `task_name` | `str` | `None` | Override the task name for tracing |
+| `render` | `bool` | `True` | Whether to render template variables |
+| `missing` | `str` | `"error"` | What to do with missing variables: `"error"` or `"ignore"` |
+| `use_cache` | `bool` | `True` | Use in-memory cache for repeated fetches |
+| `timeout` | `float` | `None` | Request timeout in seconds |
+
+### Return value
+
+Returns a `Prompt` object with:
+
+| Field | Type | Description |
+| -------------- | ------ | ------------------------------------ |
+| `content` | `str` | The rendered prompt content |
+| `version` | `int` | Version number |
+| `version_id` | `str` | Version UUID |
+| `tag` | `str` | Tag this version was fetched from |
+| `is_latest` | `bool` | Whether this is the latest version |
+| `model` | `str` | Model bound to this version (if any) |
+| `metadata` | `dict` | Additional metadata |
+| `source` | `str` | `"api"` or `"fallback"` |
+| `content_hash` | `str` | SHA-256 hash of the content |
+
+## Model Deployments
+
+When you deploy a model to a prompt version in the dashboard, the SDK automatically patches the `model` parameter in your LLM calls:
+
+```python
+system_prompt = ze.prompt(
+ name="support-bot",
+ content="You are a helpful customer support agent."
+)
+
+response = client.chat.completions.create(
+ model="gpt-4", # Gets replaced with the deployed model
+ messages=[
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": "Hello"}
+ ]
+)
+```
+
+## Sending Feedback
+
+Attach feedback to completions to power prompt optimization:
+
+```python
+ze.send_feedback(
+ prompt_slug="support-bot",
+ completion_id=response.id,
+ thumbs_up=True,
+ reason="Clear and concise response"
+)
+```
+
+| Parameter | Type | Required | Description |
+| ----------------- | ------ | -------- | ------------------------------------------- |
+| `prompt_slug` | `str` | Yes | Prompt name (same as used in `ze.prompt()`) |
+| `completion_id` | `str` | Yes | UUID of the completion |
+| `thumbs_up` | `bool` | Yes | Positive or negative feedback |
+| `reason` | `str` | No | Explanation of the feedback |
+| `expected_output` | `str` | No | What the output should have been |
+| `metadata` | `dict` | No | Additional metadata |
diff --git a/autotune/sdks/typescript.mdx b/autotune/sdks/typescript.mdx
new file mode 100644
index 0000000..2a21310
--- /dev/null
+++ b/autotune/sdks/typescript.mdx
@@ -0,0 +1,151 @@
+---
+title: "TypeScript"
+description: "Track and version prompts in TypeScript with ze.prompt()"
+---
+
+## Installation
+
+```bash
+npm install zeroeval
+```
+
+## Basic Setup
+
+Replace hardcoded prompt strings with `ze.prompt()`. Your existing text becomes the fallback content that's used until an optimized version is available.
+
+```typescript
+import * as ze from "zeroeval";
+import { OpenAI } from "openai";
+
+ze.init();
+const client = ze.wrap(new OpenAI());
+
+const systemPrompt = await ze.prompt({
+ name: "support-bot",
+ content: "You are a helpful customer support agent for {{company}}.",
+ variables: { company: "TechCorp" },
+});
+
+const response = await client.chat.completions.create({
+ model: "gpt-4",
+ messages: [
+ { role: "system", content: systemPrompt },
+ { role: "user", content: "How do I reset my password?" },
+ ],
+});
+```
+
+Every call to `ze.prompt()` is tracked, versioned, and linked to the completions it produces. You'll see production traces at [ZeroEval → Prompts](https://app.zeroeval.com).
+
+
+ When you provide `content`, ZeroEval automatically uses the latest optimized
+ version from your dashboard if one exists. The `content` parameter serves as a
+ fallback for when no optimized versions are available yet.
+
+
+## Version Control
+
+### Auto-optimization (default)
+
+```typescript
+const prompt = await ze.prompt({
+ name: "customer-support",
+ content: "You are a helpful assistant.",
+});
+```
+
+Uses the latest optimized version if one exists, otherwise falls back to the provided content.
+
+### Explicit mode
+
+```typescript
+const prompt = await ze.prompt({
+ name: "customer-support",
+ from: "explicit",
+ content: "You are a helpful assistant.",
+});
+```
+
+Always uses the provided content. Useful for debugging or A/B testing a specific version.
+
+### Latest mode
+
+```typescript
+const prompt = await ze.prompt({
+ name: "customer-support",
+ from: "latest",
+});
+```
+
+Requires an optimized version to exist. Fails with `PromptRequestError` if none is found.
+
+### Pin to a specific version
+
+```typescript
+const prompt = await ze.prompt({
+ name: "customer-support",
+ from: "a1b2c3d4...", // 64-char SHA-256 hash
+});
+```
+
+## Parameters
+
+| Parameter | Type | Required | Default | Description |
+| ----------- | ------------------------ | -------- | ----------- | --------------------------------------------------- |
+| `name` | `string` | Yes | — | Task name for this prompt |
+| `content` | `string` | No | `undefined` | Prompt content (fallback or explicit) |
+| `from` | `string` | No | `undefined` | `"latest"`, `"explicit"`, or a 64-char SHA-256 hash |
+| `variables` | `Record` | No | `undefined` | Template variables for `{{var}}` tokens |
+
+### Return value
+
+Returns `Promise` -- a decorated prompt string with metadata that integrations use to link completions to prompt versions and auto-patch models.
+
+### Errors
+
+| Error | When |
+| --------------------- | -------------------------------------------------------------------------- |
+| `Error` | Both `content` and `from` provided (except `from: "explicit"`), or neither |
+| `PromptRequestError` | `from: "latest"` but no versions exist |
+| `PromptNotFoundError` | `from` is a hash that doesn't exist |
+
+## Model Deployments
+
+When you deploy a model to a prompt version in the dashboard, the SDK automatically patches the `model` parameter in your LLM calls:
+
+```typescript
+const systemPrompt = await ze.prompt({
+ name: "support-bot",
+ content: "You are a helpful customer support agent.",
+});
+
+const response = await client.chat.completions.create({
+ model: "gpt-4", // Gets replaced with the deployed model
+ messages: [
+ { role: "system", content: systemPrompt },
+ { role: "user", content: "Hello" },
+ ],
+});
+```
+
+## Sending Feedback
+
+Attach feedback to completions to power prompt optimization:
+
+```typescript
+await ze.sendFeedback({
+ promptSlug: "support-bot",
+ completionId: response.id,
+ thumbsUp: true,
+ reason: "Clear and concise response",
+});
+```
+
+| Parameter | Type | Required | Description |
+| ---------------- | ------------------------- | -------- | ------------------------------------------- |
+| `promptSlug` | `string` | Yes | Prompt name (same as used in `ze.prompt()`) |
+| `completionId` | `string` | Yes | UUID of the completion |
+| `thumbsUp` | `boolean` | Yes | Positive or negative feedback |
+| `reason` | `string` | No | Explanation of the feedback |
+| `expectedOutput` | `string` | No | What the output should have been |
+| `metadata` | `Record` | No | Additional metadata |
diff --git a/autotune/setup.mdx b/autotune/setup.mdx
deleted file mode 100644
index 9340831..0000000
--- a/autotune/setup.mdx
+++ /dev/null
@@ -1,233 +0,0 @@
----
-title: "Setup"
-description: "Getting started with autotune"
----
-
-ZeroEval's autotune feature allows you to continuously improve your prompts and automatically deploy the best-performing models. The setup is simple and powerful.
-
-
-
-## Getting started (<5 mins)
-
-
-Replace hardcoded prompts with `ze.prompt()` and include the name of the specific part of your agent that you want to tune.
-
-
-```python Python
-# Before
-prompt = "You are a helpful assistant"
-
-# After - with autotune
-prompt = ze.prompt(
- name="assistant",
- content="You are a helpful assistant"
-)
-```
-```typescript TypeScript
-// Before
-const prompt = "You are a helpful assistant";
-
-// After - with autotune
-const prompt = await ze.prompt({
- name: "assistant",
- content: "You are a helpful assistant"
-});
-```
-
-
-That's it! You'll start seeing production traces in your dashboard for this specific task at [`ZeroEval › Prompts › [task_name]`](https://app.zeroeval.com).
-
-
-**Auto-tune behavior:** When you provide `content`, ZeroEval automatically uses the latest optimized version from your dashboard if one exists. The `content` parameter serves as a fallback for when no optimized versions are available yet. This means you can hardcode a default prompt in your code, but ZeroEval will seamlessly swap in tuned versions without any code changes.
-
-To explicitly use the hardcoded content and bypass auto-optimization, use `from_="explicit"` (Python) or `from: "explicit"` (TypeScript):
-
-
-```python Python
-prompt = ze.prompt(
- name="assistant",
- from_="explicit",
- content="You are a helpful assistant"
-)
-```
-```typescript TypeScript
-const prompt = await ze.prompt({
- name: "assistant",
- from: "explicit",
- content: "You are a helpful assistant"
-});
-```
-
-
-
-## Pushing models to production
-
-Once you see a model that performs well, you can send it to production with a single click, as seen below.
-
-
-
-
-Your specified model gets replaced automatically any time you use the prompt from `ze.prompt()`, as seen below.
-
-
-```python Python
-# You write this
-response = client.chat.completions.create(
- model="gpt-4", # ← Gets replaced!
- messages=[{"role": "system", "content": prompt}]
-)
-```
-```typescript TypeScript
-// You write this
-const response = await openai.chat.completions.create({
- model: "gpt-4", // ← Gets replaced!
- messages: [{ role: "system", content: prompt }]
-});
-```
-
-
-## Example
-
-Here's autotune in action for a simple customer support bot:
-
-
-```python Python
-import zeroeval as ze
-from openai import OpenAI
-
-ze.init()
-client = OpenAI()
-
-# Define your prompt with version tracking
-system_prompt = ze.prompt(
- name="support-bot",
- content="""You are a customer support agent for {{company}}.
- Be helpful, concise, and professional.""",
- variables={"company": "TechCorp"}
-)
-
-# Use it normally - model gets patched automatically
-response = client.chat.completions.create(
- model="gpt-4", # This might run claude-3-sonnet in production!
- messages=[
- {"role": "system", "content": system_prompt},
- {"role": "user", "content": "I need help with my order"}
- ]
-)
-```
-```typescript TypeScript
-import * as ze from 'zeroeval';
-import { OpenAI } from 'openai';
-
-ze.init();
-const client = ze.wrap(new OpenAI());
-
-// Define your prompt with version tracking
-const systemPrompt = await ze.prompt({
- name: "support-bot",
- content: `You are a customer support agent for {{company}}.
- Be helpful, concise, and professional.`,
- variables: { company: "TechCorp" }
-});
-
-// Use it normally - model gets patched automatically
-const response = await client.chat.completions.create({
- model: "gpt-4", // This might run claude-3-sonnet in production!
- messages: [
- { role: "system", content: systemPrompt },
- { role: "user", content: "I need help with my order" }
- ]
-});
-```
-
-
-## Understanding Prompt Versions
-
-ZeroEval automatically manages prompt versions for you. When you use `ze.prompt()` with `content`, the SDK will:
-
-1. **Check for optimized versions**: First, it tries to fetch the latest optimized version from your dashboard
-2. **Fall back to your content**: If no optimized versions exist yet, it uses the `content` you provided
-3. **Create a version**: Your provided content is stored as the initial version for this task
-
-This means you get the best of both worlds: hardcoded fallback prompts in your code, with automatic optimization in production.
-
-
-```python Python
-# This will use the latest optimized version if one exists in your dashboard
-# Otherwise, it uses the content you provide here
-prompt = ze.prompt(
- name="customer-support",
- content="You are a helpful assistant."
-)
-```
-```typescript TypeScript
-// This will use the latest optimized version if one exists in your dashboard
-// Otherwise, it uses the content you provide here
-const prompt = await ze.prompt({
- name: "customer-support",
- content: "You are a helpful assistant."
-});
-```
-
-
-### Explicit version control
-
-If you need more control over which version to use:
-
-
-```python Python
-# Always use the latest optimized version (fails if none exists)
-prompt = ze.prompt(
- name="customer-support",
- from_="latest"
-)
-
-# Always use the hardcoded content (bypass auto-optimization)
-prompt = ze.prompt(
- name="customer-support",
- from_="explicit",
- content="You are a helpful assistant."
-)
-
-# Use a specific version by its content hash
-prompt = ze.prompt(
- name="customer-support",
- from_="a1b2c3d4..." # 64-character SHA-256 hash
-)
-```
-```typescript TypeScript
-// Always use the latest optimized version (fails if none exists)
-const prompt = await ze.prompt({
- name: "customer-support",
- from: "latest"
-});
-
-// Always use the hardcoded content (bypass auto-optimization)
-const prompt = await ze.prompt({
- name: "customer-support",
- from: "explicit",
- content: "You are a helpful assistant."
-});
-
-// Use a specific version by its content hash
-const prompt = await ze.prompt({
- name: "customer-support",
- from: "a1b2c3d4..." // 64-character SHA-256 hash
-});
-```
-
-
-### When to use each mode
-
-| Mode | Use Case | Behavior |
-|------|----------|----------|
-| `content` only | **Recommended for most cases** | Auto-optimization with fallback |
-| `from_="explicit"` (Python) / `from: "explicit"` (TS) | Testing, debugging, or A/B testing specific prompts | Always use hardcoded content |
-| `from_="latest"` (Python) / `from: "latest"` (TS) | Production where optimization is required | Fail if no optimized version exists |
-| `from_=""` (Python) / `from: ""` (TS) | Pinning to specific tested versions | Use exact version by hash |
-
-
-**Best practice**: Use `content` parameter alone for local development and production. ZeroEval will automatically use optimized versions when available. Only use `from_="explicit"` (Python) or `from: "explicit"` (TypeScript) when you specifically need to test or debug the hardcoded content.
-
-
-
diff --git a/docs.json b/docs.json
index 8066922..b1b3789 100644
--- a/docs.json
+++ b/docs.json
@@ -16,7 +16,7 @@
{
"group": "Tracing",
"pages": [
- "tracing/quickstart",
+ "tracing/introduction",
{
"group": "SDKs",
"pages": [
@@ -38,43 +38,49 @@
}
]
},
- {
- "group": "Advanced",
- "pages": [
- "tracing/sessions",
- "tracing/tagging",
- "tracing/signals",
- "tracing/manual-instrumentation",
- "tracing/opentelemetry"
- ]
- },
- "tracing/reference"
+ "tracing/api-reference",
+ "tracing/opentelemetry"
]
},
{
"group": "Prompts",
"pages": [
"autotune/introduction",
- "autotune/setup",
- {
- "group": "Optimization",
- "pages": ["autotune/prompts/models", "autotune/prompts/prompts"]
- },
+ "autotune/sdks/python",
+ "autotune/sdks/typescript",
"autotune/reference"
]
},
{
- "group": "Judges",
+ "group": "Feedback",
"pages": [
- "judges/introduction",
- "judges/setup",
- "judges/multimodal-evaluation",
- "judges/submit-feedback",
- "judges/pull-evaluations"
+ "feedback/introduction",
+ {
+ "group": "Human Feedback",
+ "pages": [
+ "feedback/human-feedback",
+ "feedback/python",
+ "feedback/typescript",
+ "feedback/api-reference"
+ ]
+ },
+ {
+ "group": "AI Feedback (Judges)",
+ "pages": [
+ "judges/introduction",
+ "judges/calibration",
+ "judges/multimodal-evaluation",
+ "judges/pull-evaluations"
+ ]
+ }
]
},
{
- "group": "Integrations",
+ "group": "Optimization",
+ "pages": ["autotune/prompts/prompts"]
+ },
+ {
+ "group": "Helpers",
"pages": [
"integrations/introduction",
"integrations/skills",
diff --git a/feedback/api-reference.mdx b/feedback/api-reference.mdx
new file mode 100644
index 0000000..338c16d
--- /dev/null
+++ b/feedback/api-reference.mdx
@@ -0,0 +1,171 @@
+---
+title: "API Reference"
+description: "REST API for submitting and retrieving feedback"
+---
+
+Base URL: `https://api.zeroeval.com`
+
+All requests require a Bearer token:
+
+```
+Authorization: Bearer YOUR_ZEROEVAL_API_KEY
+```
+
+---
+
+## Completion Feedback
+
+```
+POST /v1/prompts/{prompt_slug}/completions/{completion_id}/feedback
+```
+
+Submit structured feedback for a specific LLM completion. This feedback powers prompt optimization.
+
+**Request body:**
+
+| Field | Type | Required | Description |
+| ------------------- | -------- | -------- | ---------------------------------- |
+| `thumbs_up` | `bool` | Yes | Positive or negative feedback |
+| `reason` | `string` | No | Explanation of the feedback |
+| `expected_output` | `string` | No | What the output should have been |
+| `metadata` | `object` | No | Additional metadata |
+| `judge_id` | `string` | No | Judge automation ID |
+| `expected_score` | `float` | No | Expected score (for scored judges) |
+| `score_direction` | `string` | No | `"too_high"` or `"too_low"` |
+| `criteria_feedback` | `object` | No | Per-criterion feedback |
+
+```bash
+curl -X POST https://api.zeroeval.com/v1/prompts/support-bot/completions/550e8400-.../feedback \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "thumbs_up": false,
+ "reason": "Response was too vague",
+ "expected_output": "Should provide specific steps"
+ }'
+```
+
+**Response:** 200
+
+```json
+{
+ "id": "fb123e45-...",
+ "completion_id": "550e8400-...",
+ "prompt_id": "a1b2c3d4-...",
+ "thumbs_up": false,
+ "reason": "Response was too vague",
+ "expected_output": "Should provide specific steps",
+ "created_at": "2025-01-15T10:30:00Z"
+}
+```
+
+
+ If feedback already exists for the same completion from the same user, it will
+ be updated with the new values.
+
+
+---
+
+## Unified Entity Feedback
+
+```
+GET /projects/{project_id}/feedback/{entity_type}/{entity_id}
+```
+
+Retrieve all feedback -- human reviews and judge evaluations -- for a span, trace, or session in a single response.
+
+| Path Parameter | Description |
+| -------------- | ---------------------------------- |
+| `project_id` | UUID of the project |
+| `entity_type` | `span`, `trace`, or `session` |
+| `entity_id` | UUID of the entity |
+
+**Response:** 200
+
+```json
+{
+ "entity_type": "span",
+ "entity_id": "550e8400-...",
+ "summary": {
+ "total": 3,
+ "human_feedback_count": 1,
+ "judge_evaluation_count": 2
+ },
+ "items": [
+ {
+ "kind": "human_feedback",
+ "id": "fb123e45-...",
+ "span_id": "550e8400-...",
+ "thumbs_up": true,
+ "reason": "Clear and helpful",
+ "created_at": "2025-01-15T10:30:00Z",
+ "created_by": {
+ "id": "user-123",
+ "email": "reviewer@example.com",
+ "name": "Alice"
+ },
+ "source_type": "human"
+ },
+ {
+ "kind": "judge_evaluation",
+ "id": "je456f78-...",
+ "span_id": "550e8400-...",
+ "automation_id": "judge-abc-...",
+ "judge_name": "Helpfulness",
+ "evaluation_result": true,
+ "evaluation_reason": "Response directly answers the question with clear steps.",
+ "confidence_score": 0.92,
+ "model_used": "gemini-3-flash-preview",
+ "evaluation_duration_ms": 1200,
+ "score": 8.5,
+ "evaluation_type": "scored",
+ "score_min": 0,
+ "score_max": 10,
+ "pass_threshold": 7.0,
+ "criteria_scores": {
+ "clarity": { "score": 9, "reason": "Well-structured response" },
+ "accuracy": { "score": 8, "reason": "Correct information provided" }
+ },
+ "created_at": "2025-01-15T10:31:00Z"
+ }
+ ]
+}
+```
+
+### Response fields
+
+**`summary`** -- aggregate counts for fast display:
+
+| Field | Type | Description |
+| -------------------------- | ----- | ---------------------------------- |
+| `total` | `int` | Total feedback items |
+| `human_feedback_count` | `int` | Number of human review items |
+| `judge_evaluation_count` | `int` | Number of judge evaluation items |
+
+**`items[]`** -- each item has a `kind` field (`human_feedback` or `judge_evaluation`) that determines which fields are present:
+
+| Field (human_feedback) | Type | Description |
+| ----------------------- | -------- | ------------------------------------ |
+| `thumbs_up` | `bool` | Positive or negative |
+| `reason` | `string` | Reviewer's explanation |
+| `expected_output` | `string` | Corrected output (if provided) |
+| `created_by` | `object` | User who submitted the feedback |
+| `source_type` | `string` | `"human"` or `"judge"` |
+
+| Field (judge_evaluation) | Type | Description |
+| -------------------------- | -------- | -------------------------------------------- |
+| `automation_id` | `string` | Judge automation UUID |
+| `judge_name` | `string` | Display name of the judge |
+| `evaluation_result` | `bool` | Whether the output passed |
+| `evaluation_reason` | `string` | Judge's reasoning |
+| `confidence_score` | `float` | Judge confidence (0-1) |
+| `model_used` | `string` | Model used for the evaluation |
+| `score` | `float` | Score value (scored evaluations only) |
+| `evaluation_type` | `string` | `"binary"` or `"scored"` |
+| `score_min` / `score_max` | `float` | Score range (scored evaluations only) |
+| `pass_threshold` | `float` | Threshold for pass/fail |
+| `criteria_scores` | `object` | Per-criterion scores and reasons |
+
+
+ For traces and sessions, feedback is aggregated from all descendant spans.
+
diff --git a/feedback/human-feedback.mdx b/feedback/human-feedback.mdx
new file mode 100644
index 0000000..b9fa5a7
--- /dev/null
+++ b/feedback/human-feedback.mdx
@@ -0,0 +1,103 @@
+---
+title: "Introduction"
+description: "Collect feedback from reviewers in the dashboard or from end users via the API"
+---
+
+Human feedback captures what automated metrics can't -- whether the agent's response actually helped the user. You can collect it from internal reviewers using the ZeroEval dashboard, or from end users in your own application via the API.
+
+## Dashboard Review
+
+Reviewers can browse completions directly in the ZeroEval console and submit feedback without writing any code.
+
+1. Navigate to **Prompts → [your task]** in the dashboard
+2. Open the **Suggestions** tab to see incoming completions
+3. Review each output and provide thumbs-up or thumbs-down
+4. Optionally add a reason and the expected output
+
+Dashboard feedback is linked to the exact prompt version and model that produced the completion, so you can track quality per version over time.
+
+## In-App Feedback
+
+Add like/dislike buttons, star ratings, or other feedback controls directly in your application. Use the SDK or REST API to send feedback tied to specific completions.
+
+### Thumbs-up / Thumbs-down
+
+
+
+```python Python
+import zeroeval as ze
+
+ze.send_feedback(
+ prompt_slug="support-bot",
+ completion_id=response.id,
+ thumbs_up=user_clicked_thumbs_up,
+ reason="User found the answer helpful"
+)
+```
+
+```typescript TypeScript
+await ze.sendFeedback({
+ promptSlug: "support-bot",
+ completionId: response.id,
+ thumbsUp: userClickedThumbsUp,
+ reason: "User found the answer helpful",
+});
+```
+
+```bash cURL
+curl -X POST "https://api.zeroeval.com/v1/prompts/support-bot/completions/$COMPLETION_ID/feedback" \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "thumbs_up": true,
+ "reason": "User found the answer helpful"
+ }'
+```
+
+
+
+### With expected output
+
+When a user corrects the agent, capture what the response should have been:
+
+
+
+```python Python
+ze.send_feedback(
+ prompt_slug="support-bot",
+ completion_id=response.id,
+ thumbs_up=False,
+ reason="Wrong link provided",
+ expected_output="The password reset page is at https://app.example.com/reset"
+)
+```
+
+```typescript TypeScript
+await ze.sendFeedback({
+ promptSlug: "support-bot",
+ completionId: response.id,
+ thumbsUp: false,
+ reason: "Wrong link provided",
+ expectedOutput: "The password reset page is at https://app.example.com/reset",
+});
+```
+
+
+
+Expected outputs are used during [prompt optimization](/autotune/prompts/prompts) to generate better prompt variants.
+
+## Feedback Links
+
+For collecting feedback from users who don't have a ZeroEval account (e.g. customers, external reviewers), create a feedback link that anyone can use:
+
+```bash
+curl -X POST https://api.zeroeval.com/feedback-links \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "prompt_slug": "support-bot",
+ "link_type": "prompt_completion"
+ }'
+```
+
+Share the returned URL with reviewers. They can submit feedback without needing API access or a ZeroEval account.
diff --git a/feedback/introduction.mdx b/feedback/introduction.mdx
new file mode 100644
index 0000000..b004331
--- /dev/null
+++ b/feedback/introduction.mdx
@@ -0,0 +1,52 @@
+---
+title: "Introduction"
+description: "Attach human and AI feedback to your agent interactions to drive quality improvements"
+---
+
+Your agent produces thousands of outputs, but without feedback you can't tell which ones are good. Feedback closes the loop -- it connects real-world quality judgments to the traces, spans, and completions your agent generates.
+
+ZeroEval supports two kinds of feedback:
+
+- **Human feedback** -- thumbs-up/down, star ratings, corrections, and expected outputs submitted by users or reviewers
+- **AI feedback** -- automated evaluations from calibrated judges that score outputs against criteria you define
+
+Both feed into the same system. Feedback attached to completions powers [prompt optimization](/autotune/introduction). You can also retrieve unified feedback -- combining human reviews and judge evaluations -- for any span, trace, or session via the [Feedback API](/feedback/api-reference#unified-entity-feedback).
+
+## How feedback flows
+
+
+
+ Your agent runs and ZeroEval captures the full trace -- inputs, outputs,
+ model, prompt version.
+
+
+ Humans review outputs in the dashboard or your app submits feedback
+ programmatically. Judges evaluate outputs automatically based on your
+ criteria.
+
+
+ Feedback appears on spans, traces, and completions in the console. Filter by
+ thumbs-up rate, judge scores, or tags to find patterns.
+
+
+ Use feedback to optimize prompts, compare models, calibrate judges, and
+ catch regressions before users do.
+
+
+
+## Get started
+
+
+
+ Collect feedback from reviewers in the dashboard or from end users via
+ like/dislike buttons, ratings, and corrections.
+
+
+ Configure AI evaluators that automatically score every completion against
+ criteria you define. Includes built-in judges for common failures.
+
+
diff --git a/feedback/python.mdx b/feedback/python.mdx
new file mode 100644
index 0000000..10918b1
--- /dev/null
+++ b/feedback/python.mdx
@@ -0,0 +1,65 @@
+---
+title: "Python"
+description: "Submit completion feedback from Python to power prompt optimization"
+---
+
+## Completion Feedback
+
+Attach structured feedback to a specific LLM completion to power prompt optimization.
+
+### `send_feedback()`
+
+```python
+ze.send_feedback(
+ prompt_slug="support-bot",
+ completion_id="550e8400-e29b-41d4-a716-446655440000",
+ thumbs_up=False,
+ reason="Response was too verbose",
+ expected_output="A concise 2-3 sentence response"
+)
+```
+
+| Parameter | Type | Required | Description |
+| ------------------- | ------- | -------- | --------------------------------------------------------------------------------- |
+| `prompt_slug` | `str` | Yes | Prompt name (same as `ze.prompt(name=...)`) |
+| `completion_id` | `str` | Yes | UUID of the completion span |
+| `thumbs_up` | `bool` | Yes | Positive or negative feedback |
+| `reason` | `str` | No | Explanation of the feedback |
+| `expected_output` | `str` | No | What the output should have been |
+| `metadata` | `dict` | No | Additional metadata |
+| `judge_id` | `str` | No | Judge automation ID (for judge feedback) |
+| `expected_score` | `float` | No | Expected score (for scored judges) |
+| `score_direction` | `str` | No | `"too_high"` or `"too_low"` |
+| `criteria_feedback` | `dict` | No | Per-criterion feedback: `{"criterion": {"expected_score": 4.0, "reason": "..."}}` |
+
+### End-to-end example
+
+```python
+import zeroeval as ze
+from openai import OpenAI
+
+ze.init()
+client = OpenAI()
+
+system_prompt = ze.prompt(
+ name="support-bot",
+ content="You are a helpful customer support agent."
+)
+
+response = client.chat.completions.create(
+ model="gpt-4",
+ messages=[
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": "How do I reset my password?"}
+ ]
+)
+
+is_good = evaluate_response(response.choices[0].message.content)
+
+ze.send_feedback(
+ prompt_slug="support-bot",
+ completion_id=response.id,
+ thumbs_up=is_good,
+ reason="Clear instructions" if is_good else "Missing reset link"
+)
+```
diff --git a/feedback/typescript.mdx b/feedback/typescript.mdx
new file mode 100644
index 0000000..9e7cddf
--- /dev/null
+++ b/feedback/typescript.mdx
@@ -0,0 +1,64 @@
+---
+title: "TypeScript"
+description: "Submit completion feedback from TypeScript to power prompt optimization"
+---
+
+## Completion Feedback
+
+Attach structured feedback to a specific LLM completion to power prompt optimization.
+
+### `sendFeedback()`
+
+```typescript
+await ze.sendFeedback({
+ promptSlug: "support-bot",
+ completionId: "550e8400-e29b-41d4-a716-446655440000",
+ thumbsUp: false,
+ reason: "Response was too verbose",
+ expectedOutput: "A concise 2-3 sentence response",
+});
+```
+
+| Parameter | Type | Required | Description |
+| ---------------- | ------------------------- | -------- | ------------------------------------------------ |
+| `promptSlug` | `string` | Yes | Prompt name (same as `ze.prompt({ name: ... })`) |
+| `completionId` | `string` | Yes | UUID of the completion span |
+| `thumbsUp` | `boolean` | Yes | Positive or negative feedback |
+| `reason` | `string` | No | Explanation of the feedback |
+| `expectedOutput` | `string` | No | What the output should have been |
+| `metadata` | `Record` | No | Additional metadata |
+| `judgeId` | `string` | No | Judge automation ID (for judge feedback) |
+| `expectedScore` | `number` | No | Expected score (for scored judges) |
+| `scoreDirection` | `'too_high' \| 'too_low'` | No | Score direction for scored judges |
+
+### End-to-end example
+
+```typescript
+import * as ze from "zeroeval";
+import { OpenAI } from "openai";
+
+ze.init();
+const client = ze.wrap(new OpenAI());
+
+const systemPrompt = await ze.prompt({
+ name: "support-bot",
+ content: "You are a helpful customer support agent.",
+});
+
+const response = await client.chat.completions.create({
+ model: "gpt-4",
+ messages: [
+ { role: "system", content: systemPrompt },
+ { role: "user", content: "How do I reset my password?" },
+ ],
+});
+
+const isGood = evaluateResponse(response.choices[0].message.content);
+
+await ze.sendFeedback({
+ promptSlug: "support-bot",
+ completionId: response.id,
+ thumbsUp: isGood,
+ reason: isGood ? "Clear instructions" : "Missing reset link",
+});
+```
diff --git a/judges/calibration.mdx b/judges/calibration.mdx
new file mode 100644
index 0000000..4ded9f2
--- /dev/null
+++ b/judges/calibration.mdx
@@ -0,0 +1,199 @@
+---
+title: "Calibration"
+description: "Correct judge evaluations to improve accuracy over time"
+---
+
+Judges get better the more you correct them. Each time you mark an evaluation as right or wrong, that correction is stored and used to refine future scoring. This is calibration.
+
+## Calibrating in the dashboard
+
+For each evaluated item in the console, you can mark the judge's assessment as correct or incorrect and optionally provide the expected answer.
+
+
+
+
+
+## Calibrating programmatically
+
+Submit corrections via the SDK or REST API. This is useful for bulk calibration from automated pipelines, custom review workflows, or external labeling tools.
+
+### Finding the right IDs
+
+Judge evaluations involve two related spans:
+
+| ID | Description |
+| ---------------------- | -------------------------------------------------- |
+| **Source Span ID** | The original LLM call that was evaluated |
+| **Judge Call Span ID** | The span created when the judge ran its evaluation |
+
+| ID | Where to Find It |
+| ------------- | ----------------------------------------------------------------- |
+| **Task Slug** | In the judge settings, or the URL when editing the judge's prompt |
+| **Span ID** | In the evaluation modal, or via `get_judge_evaluations()` |
+| **Judge ID** | In the URL when viewing a judge (`/judges/{judge_id}`) |
+
+
+ The easiest way to get the correct IDs: open a judge evaluation in the
+ dashboard, expand "SDK Integration", and click "Copy" to get pre-filled code.
+
+
+### Binary judges
+
+Mark a judge evaluation as correct or incorrect:
+
+
+
+```python Python
+import zeroeval as ze
+
+ze.send_feedback(
+ prompt_slug="your-judge-task-slug",
+ completion_id="span-id-here",
+ thumbs_up=True,
+ reason="Judge correctly identified the issue",
+ judge_id="automation-id-here",
+)
+```
+
+```bash cURL
+curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "thumbs_up": true,
+ "reason": "Judge correctly identified the issue",
+ "judge_id": "automation-uuid-here"
+ }'
+```
+
+
+
+### Scored judges
+
+For judges using scored rubrics, provide the expected score and direction:
+
+
+
+```python Python
+ze.send_feedback(
+ prompt_slug="quality-scorer",
+ completion_id="span-id-here",
+ thumbs_up=False,
+ judge_id="automation-id-here",
+ expected_score=3.5,
+ score_direction="too_high",
+ reason="Score should have been lower due to grammar issues",
+)
+```
+
+```bash cURL
+curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "thumbs_up": false,
+ "judge_id": "automation-uuid-here",
+ "expected_score": 3.5,
+ "score_direction": "too_high",
+ "reason": "Score should have been lower"
+ }'
+```
+
+
+
+### Per-criterion feedback
+
+For scored judges with multiple criteria, correct individual criterion scores:
+
+
+
+```python Python
+ze.send_feedback(
+ prompt_slug="quality-scorer",
+ completion_id="span-id-here",
+ thumbs_up=False,
+ judge_id="automation-id-here",
+ reason="Criterion-level score adjustments",
+ criteria_feedback={
+ "CTA_text": {
+ "expected_score": 4.0,
+ "reason": "CTA is clear and prominent"
+ },
+ "CX-004": {
+ "expected_score": 1.0,
+ "reason": "Required phone number is missing"
+ }
+ }
+)
+```
+
+```bash cURL
+curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "thumbs_up": false,
+ "judge_id": "automation-uuid-here",
+ "criteria_feedback": {
+ "CTA_text": {"expected_score": 4.0, "reason": "CTA is clear and visible"},
+ "CX-004": {"expected_score": 1.0, "reason": "Phone number is missing"}
+ }
+ }'
+```
+
+
+
+To discover valid criterion keys before sending per-criterion feedback:
+
+```python
+criteria = ze.get_judge_criteria(
+ project_id="your-project-id",
+ judge_id="automation-id-here",
+)
+
+for c in criteria["criteria"]:
+ print(c["key"], c.get("description"))
+```
+
+### Parameters
+
+| Parameter | Type | Required | Description |
+| ------------------- | ------- | -------- | ------------------------------------------------ |
+| `prompt_slug` | `str` | Yes | Task slug associated with the judge |
+| `completion_id` | `str` | Yes | Span ID being evaluated |
+| `thumbs_up` | `bool` | Yes | `True` if judge was correct, `False` if wrong |
+| `reason` | `str` | No | Explanation of the correction |
+| `judge_id` | `str` | Yes | Judge automation ID |
+| `expected_score` | `float` | No | Expected score (scored judges only) |
+| `score_direction` | `str` | No | `"too_high"` or `"too_low"` (scored judges only) |
+| `criteria_feedback` | `dict` | No | Per-criterion corrections (scored judges only) |
+
+## Bulk calibration
+
+Iterate through evaluations and submit corrections programmatically:
+
+```python
+evaluations = ze.get_judge_evaluations(
+ project_id="your-project-id",
+ judge_id="your-judge-id",
+ limit=100,
+)
+
+for eval in evaluations["evaluations"]:
+ is_correct = your_review_logic(eval)
+
+ ze.send_feedback(
+ prompt_slug="your-judge-task-slug",
+ completion_id=eval["span_id"],
+ thumbs_up=is_correct,
+ reason="Automated review",
+ judge_id="your-judge-id",
+ )
+```
diff --git a/judges/introduction.mdx b/judges/introduction.mdx
index e1184d4..e174511 100644
--- a/judges/introduction.mdx
+++ b/judges/introduction.mdx
@@ -1,20 +1,85 @@
---
title: "Introduction"
-description: "Continuously evaluate your production traffic with judges that learn over time"
+description: "AI evaluators that give automated feedback on your agent's interactions"
---
-
+Judges are AIs that give feedback to your agent's interactions -- individual messages, full conversations, or entire workflows. They evaluate spans, traces, and sessions against criteria you define, scoring outputs automatically so you don't have to review every response manually.
-Calibrated LLM judges are AI evaluators that watch your traces, sessions, or spans and score outputs according to criteria you define. They get better over time the more you refine and correct their evaluations.
+Judges calibrate over time. The more you correct their evaluations, the more accurate they become at catching the issues that matter to your use case.
-## When to use
+## Built-in judges
-Use a judge when you want consistent, scalable evaluation of:
+ZeroEval ships with out-of-the-box judges that detect common failure patterns without any configuration:
-- Hallucinations, safety/policy violations
-- Response quality (helpfulness, tone, structure)
-- Latency, cost, and error patterns tied to specific criteria
+- **User corrections** -- detects when a user rephrases or corrects the agent's output
+- **User frustration** -- identifies signs of dissatisfaction, confusion, or repeated requests
+- **Task failures** -- catches when the agent fails to complete the user's request
+- **Hallucinations** -- flags responses that contain fabricated or unsupported claims
+- **Safety violations** -- detects harmful, biased, or policy-violating content
+
+## Suggested judges
+
+Based on your production traffic, ZeroEval can suggest judges that would be useful for your specific use case. As traces flow in, we analyze patterns in your agent's behavior and recommend evaluation criteria tailored to the failures and edge cases we observe.
+
+## Custom judges
+
+Create your own judges to evaluate anything specific to your domain:
+
+- Response quality (helpfulness, tone, structure, completeness)
+- Domain accuracy (legal, medical, financial correctness)
+- Format compliance (JSON output, specific templates, length constraints)
+- Business rules (pricing accuracy, policy adherence, SLA compliance)
+
+## Creating a judge
+
+
+
+ Go to [Monitoring → Judges → New
+ Judge](https://app.zeroeval.com/monitoring/judges). Choose between a
+ **binary** judge (pass/fail) or a **scored** judge (rubric with multiple
+ criteria).
+
+
+ Specify what the judge should look for in your agent's output. Tweak the
+ prompt until it matches the quality bar you're aiming for.
+
+
+ Historical and future traces are scored automatically. Results appear in the
+ dashboard immediately.
+
+
+
+
+
+## Calibration
+
+AI judges are powerful but imperfect. Out of the box, a judge might:
+
+- **Miss nuance** -- flag a correct but unconventional answer as wrong
+- **Be too lenient** -- let low-quality responses pass because they're grammatically correct
+- **Misinterpret context** -- score domain-specific content poorly because it lacks your business context
+
+This is expected. A judge's first evaluations are a starting point, not a finished product.
+
+The fix is simple: **correct the judge when it's wrong**. Each correction teaches the judge what "good" and "bad" look like for your specific use case. Over time, the judge converges on your quality bar.
+
+```mermaid
+flowchart LR
+ A[Judge evaluates] --> B[You review]
+ B -->|Correct| C[Reinforced]
+ B -->|Wrong| D[Corrected]
+ C --> E[Judge improves]
+ D --> E
+ E --> A
+```
+
+
+ Learn how to correct judge evaluations in the dashboard or programmatically
+ via SDK to improve accuracy over time.
+
-Not sure where to start? The [create-judge skill](/integrations/skills) can help you pick an evaluation type, write a template, and set up criteria from inside your coding agent.
+ Using Cursor, Claude Code, or another coding agent? The [`create-judge`
+ skill](/integrations/skills) can help you pick an evaluation type, write the
+ template, and create the judge without leaving your editor.
diff --git a/judges/setup.mdx b/judges/setup.mdx
deleted file mode 100644
index b96f01a..0000000
--- a/judges/setup.mdx
+++ /dev/null
@@ -1,26 +0,0 @@
----
-title: "Setup"
-description: "Create and calibrate an AI judge in minutes"
----
-
-
-
-## Creating a judge (<5 mins)
-
-1. Go to [Monitoring → Judges → New Judge](https://app.zeroeval.com/monitoring/judges).
-2. Specify the criteria that you want to evaluate from your production traffic.
-3. Tweak the prompt of the judge until it matches what you are looking for!
-
-That's it! Historical and future traces will be scored automatically and shown in the dashboard.
-
-
-## Calibrating your judge
-
-For each evaluated item you have the option to mark it as correct or incorrect. This is automatically stored and used to improve the judge over time.
-
-
-
-
-Using Cursor, Claude Code, or another coding agent? The [`create-judge` skill](/integrations/skills) can help you pick an evaluation type, write the template, and create the judge without leaving your editor.
-
-
diff --git a/judges/submit-feedback.mdx b/judges/submit-feedback.mdx
deleted file mode 100644
index 7a9796d..0000000
--- a/judges/submit-feedback.mdx
+++ /dev/null
@@ -1,265 +0,0 @@
----
-title: "Submitting Feedback"
-description: "Programmatically submit feedback for judge evaluations via SDK"
----
-
-## Overview
-
-When calibrating judges, you can submit feedback programmatically using the SDK.
-This is useful for:
-
-- Bulk feedback submission from automated pipelines
-- Integration with custom review workflows
-- Syncing feedback from external labeling tools
-
-Your existing `send_feedback` integrations remain valid. Criterion-level feedback is an optional extension for scored judges.
-
-## Important: Using the Correct IDs
-
-Judge evaluations involve two related spans:
-
-| ID | Description |
-|---|---|
-| **Source Span ID** | The original LLM call that was evaluated |
-| **Judge Call Span ID** | The span created when the judge ran its evaluation |
-
-When submitting feedback, always include the `judge_id` parameter to ensure
-feedback is correctly associated with the judge evaluation.
-
-## Python SDK
-
-### From the UI (Recommended)
-
-The easiest way to get the correct IDs is from the Judge Evaluation modal:
-
-1. Open a judge evaluation in the dashboard
-2. Expand the "SDK Integration" section
-3. Click "Copy" to copy the pre-filled Python code
-4. Paste and customize the generated code
-
-### Manual Submission
-
-```python
-from zeroeval import ZeroEval
-
-client = ZeroEval()
-
-# Submit feedback for a judge evaluation
-client.send_feedback(
- prompt_slug="your-judge-task-slug", # The task/prompt associated with the judge
- completion_id="span-id-here", # The span ID from the evaluation
- thumbs_up=True, # True = correct, False = incorrect
- reason="Optional explanation",
- judge_id="automation-id-here", # Required for judge feedback
-)
-```
-
-### Parameters
-
-| Parameter | Type | Required | Description |
-|---|---|---|---|
-| `prompt_slug` | str | Yes | The task slug associated with the judge |
-| `completion_id` | str | Yes | The span ID being evaluated |
-| `thumbs_up` | bool | Yes | `True` if judge was correct, `False` if wrong |
-| `reason` | str | No | Explanation of the feedback |
-| `judge_id` | str | Yes* | The judge automation ID (*required for judge feedback) |
-| `expected_score` | float | No | For scored judges: the expected score value |
-| `score_direction` | str | No | For scored judges: `"too_high"` or `"too_low"` |
-| `criteria_feedback` | dict | No | For scored judges: per-criterion expected score/reason map |
-
-
- `expected_score` and `score_direction` are only valid for scored judges
- (judges with `evaluation_type: "scored"`). The API will return a 400 error
- if these fields are provided for binary judges.
-
-
-### Step 1: Discover Available Criteria (Scored Judges)
-
-Before sending `criteria_feedback`, fetch valid criterion keys for the judge.
-
-```python
-from zeroeval import ZeroEval
-
-client = ZeroEval()
-
-criteria = client.get_judge_criteria(
- project_id="your-project-id",
- judge_id="automation-id-here",
-)
-
-print(criteria["evaluation_type"]) # "scored" or "binary"
-print(criteria["criteria"]) # [{"key": "...", "label": "...", "description": "..."}]
-```
-
-```bash
-curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/criteria" \
- -H "Authorization: Bearer $ZEROEVAL_API_KEY"
-```
-
-### Step 2: Score-Based Feedback (General Score)
-
-For judges using scored rubrics (not binary pass/fail), you can provide additional
-feedback about the overall expected score:
-
-```python
-from zeroeval import ZeroEval
-
-client = ZeroEval()
-
-# Submit feedback for a scored judge evaluation
-client.send_feedback(
- prompt_slug="quality-scorer",
- completion_id="span-id-here",
- thumbs_up=False, # The judge was incorrect
- judge_id="automation-id-here",
- expected_score=3.5, # What the score should have been
- score_direction="too_high", # The judge scored too high
- reason="Score should have been lower due to grammar issues",
-)
-```
-
-### Step 3: Score-Based Feedback (Per-Criterion)
-
-For scored judges, you can send corrections for specific criteria:
-
-```python
-from zeroeval import ZeroEval
-
-client = ZeroEval()
-
-client.send_feedback(
- prompt_slug="quality-scorer",
- completion_id="span-id-here",
- thumbs_up=False,
- judge_id="automation-id-here",
- reason="Criterion-level score adjustments",
- criteria_feedback={
- "CTA_text": {
- "expected_score": 4.0,
- "reason": "CTA is clear and prominent"
- },
- "CX-004": {
- "expected_score": 1.0,
- "reason": "Required phone number is missing"
- }
- }
-)
-```
-
-## REST API
-
-### Binary Judge Feedback
-
-```bash
-curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
- -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
- -H "Content-Type: application/json" \
- -d '{
- "thumbs_up": true,
- "reason": "Judge correctly identified the issue",
- "judge_id": "automation-uuid-here"
- }'
-```
-
-### Scored Judge Feedback
-
-For scored judges, include `expected_score` and `score_direction`:
-
-```bash
-curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
- -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
- -H "Content-Type: application/json" \
- -d '{
- "thumbs_up": false,
- "reason": "Score should have been lower",
- "judge_id": "automation-uuid-here",
- "expected_score": 3.5,
- "score_direction": "too_high"
- }'
-```
-
-### Scored Judge Feedback (Criterion-Level)
-
-```bash
-curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
- -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
- -H "Content-Type: application/json" \
- -d '{
- "thumbs_up": false,
- "judge_id": "automation-uuid-here",
- "reason": "Criterion-level corrections",
- "criteria_feedback": {
- "CTA_text": {
- "expected_score": 4.0,
- "reason": "CTA is clear and visible"
- },
- "CX-004": {
- "expected_score": 1.0,
- "reason": "Phone number is missing"
- }
- }
- }'
-```
-
-## Criteria Payload Shape
-
-`criteria_feedback` uses this shape:
-
-```json
-{
- "criteria_feedback": {
- "criterion_key": {
- "expected_score": 4.0,
- "reason": "Optional explanation"
- }
- }
-}
-```
-
-Validation rules:
-- `judge_id` is required when sending `criteria_feedback`
-- `criteria_feedback` is allowed only for scored judges (`evaluation_type: "scored"`)
-
-## Finding Your IDs
-
-| ID | Where to Find It |
-|---|---|
-| **Task Slug** | In the judge settings, or the URL when editing the judge's prompt |
-| **Span ID** | In the evaluation modal, or via `get_judge_evaluations()` response |
-| **Judge ID** | In the URL when viewing a judge (`/judges/{judge_id}`) |
-
-## Bulk Feedback Submission
-
-For submitting feedback on multiple evaluations, you can iterate through evaluations:
-
-```python
-from zeroeval import ZeroEval
-
-client = ZeroEval()
-
-# Get evaluations to review
-evaluations = client.get_judge_evaluations(
- project_id="your-project-id",
- judge_id="your-judge-id",
- limit=100,
-)
-
-# Submit feedback for each
-for eval in evaluations["evaluations"]:
- # Your logic to determine if the evaluation was correct
- is_correct = your_review_logic(eval)
-
- client.send_feedback(
- prompt_slug="your-judge-task-slug",
- completion_id=eval["span_id"],
- thumbs_up=is_correct,
- reason="Automated review",
- judge_id="your-judge-id",
- )
-```
-
-## Related
-
-- [Pulling Evaluations](/judges/pull-evaluations) - Retrieve judge evaluations programmatically
-- [Python SDK Reference](/tracing/sdks/python/reference) - Full SDK API reference
-- [Judge Setup](/judges/setup) - Configure and deploy judges
diff --git a/tracing/api-reference.mdx b/tracing/api-reference.mdx
new file mode 100644
index 0000000..bf8690e
--- /dev/null
+++ b/tracing/api-reference.mdx
@@ -0,0 +1,349 @@
+---
+title: API Reference
+description: REST API for ingesting and querying spans, traces, and sessions
+---
+
+Base URL: `https://api.zeroeval.com`
+
+All requests require a Bearer token in the `Authorization` header:
+
+```
+Authorization: Bearer YOUR_ZEROEVAL_API_KEY
+```
+
+Get an API key from [Settings → API Keys](https://app.zeroeval.com/settings?section=api-keys).
+
+---
+
+## Spans
+
+### Ingest Spans
+
+```
+POST /spans
+```
+
+Send one or more spans. Traces and sessions referenced by the spans are auto-created if they don't exist.
+
+**Request body:** `SpanCreate[]`
+
+| Field | Type | Required | Default | Description |
+| ---------------- | --------------- | -------- | --------------- | ------------------------------------------------------------ |
+| `trace_id` | `string (UUID)` | Yes | — | Trace this span belongs to |
+| `name` | `string` | Yes | — | Descriptive name |
+| `started_at` | `ISO 8601` | Yes | — | When the span started |
+| `id` | `string (UUID)` | No | auto-generated | Client-provided span ID |
+| `kind` | `string` | No | `"generic"` | `generic`, `llm`, `tts`, `http`, `database`, `vector_store` |
+| `status` | `string` | No | `null` | `unset`, `ok`, `error` |
+| `ended_at` | `ISO 8601` | No | `null` | When the span completed |
+| `duration_ms` | `float` | No | `null` | Duration in milliseconds |
+| `cost` | `float` | No | auto-calculated | Cost (auto-calculated for LLM spans) |
+| `parent_span_id` | `string (UUID)` | No | `null` | Parent span for nesting |
+| `session_id` | `string (UUID)` | No | `null` | Session to associate with |
+| `session` | `object` | No | `null` | `{"id": "...", "name": "..."}` — alternative to `session_id` |
+| `input_data` | `string` | No | `null` | Input data (JSON string for messages) |
+| `output_data` | `string` | No | `null` | Output/response text |
+| `attributes` | `object` | No | `{}` | Arbitrary key-value attributes |
+| `tags` | `object` | No | `{}` | Key-value tags for filtering |
+| `trace_tags` | `object` | No | `{}` | Tags applied to the parent trace |
+| `session_tags` | `object` | No | `{}` | Tags applied to the parent session |
+| `error_code` | `string` | No | `null` | Error code or exception class |
+| `error_message` | `string` | No | `null` | Error description |
+| `error_stack` | `string` | No | `null` | Stack trace |
+| `code_filepath` | `string` | No | `null` | Source file path |
+| `code_lineno` | `int` | No | `null` | Source line number |
+
+
+
+```bash cURL
+curl -X POST https://api.zeroeval.com/spans \
+ -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '[{
+ "trace_id": "550e8400-e29b-41d4-a716-446655440001",
+ "name": "chat_completion",
+ "kind": "llm",
+ "started_at": "2025-01-15T10:30:00Z",
+ "ended_at": "2025-01-15T10:30:02Z",
+ "status": "ok",
+ "attributes": {
+ "provider": "openai",
+ "model": "gpt-4o",
+ "inputTokens": 150,
+ "outputTokens": 230
+ },
+ "input_data": "[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}]",
+ "output_data": "The capital of France is Paris.",
+ "tags": {"environment": "production"}
+ }]'
+```
+
+```python Python
+import requests, json, uuid
+from datetime import datetime, timezone
+
+requests.post(
+ "https://api.zeroeval.com/spans",
+ headers={"Authorization": f"Bearer {API_KEY}"},
+ json=[{
+ "trace_id": str(uuid.uuid4()),
+ "name": "chat_completion",
+ "kind": "llm",
+ "started_at": datetime.now(timezone.utc).isoformat(),
+ "ended_at": datetime.now(timezone.utc).isoformat(),
+ "status": "ok",
+ "attributes": {
+ "provider": "openai",
+ "model": "gpt-4o",
+ "inputTokens": 150,
+ "outputTokens": 230
+ },
+ "input_data": json.dumps([{"role": "user", "content": "Hello"}]),
+ "output_data": "Hi there!"
+ }]
+)
+```
+
+
+
+**Response:** `SpanRead[]` (200)
+
+#### LLM Cost Calculation
+
+For automatic cost calculation on LLM spans, set `kind` to `"llm"` and include these attributes:
+
+| Attribute | Required | Description |
+| -------------- | -------- | --------------------------------------- |
+| `provider` | Yes | `"openai"`, `"gemini"`, `"anthropic"` |
+| `model` | Yes | `"gpt-4o"`, `"claude-3-5-sonnet"`, etc. |
+| `inputTokens` | Yes | Number of input tokens |
+| `outputTokens` | Yes | Number of output tokens |
+
+Cost is calculated as: `(inputTokens × inputPrice + outputTokens × outputPrice) / 1,000,000`
+
+---
+
+### Query Spans
+
+```
+GET /spans
+```
+
+Retrieve spans with filtering, sorting, and pagination.
+
+| Parameter | Type | Default | Description |
+| ---------------- | ---------- | ------- | ------------------------------------------------------------- |
+| `id` | `string` | — | Filter by span ID |
+| `trace_id` | `string` | — | Filter by trace ID |
+| `parent_span_id` | `string` | — | Filter by parent; `"null"` for root spans |
+| `span_kind` | `string` | — | `llm`, `generic`, `tts`, `http`, etc. |
+| `created_before` | `ISO 8601` | — | Upper bound on created_at |
+| `created_after` | `ISO 8601` | — | Lower bound on created_at |
+| `names` | `string` | — | Comma-separated span names |
+| `error_codes` | `string` | — | Comma-separated error codes |
+| `has_error` | `bool` | — | `true` or `false` |
+| `duration_min` | `float` | — | Minimum duration (ms) |
+| `duration_max` | `float` | — | Maximum duration (ms) |
+| `cost_min` | `float` | — | Minimum cost |
+| `cost_max` | `float` | — | Maximum cost |
+| `sort_by` | `string` | — | `started_at`, `name`, `status`, `duration_ms`, `cost`, `kind` |
+| `sort_order` | `string` | `desc` | `asc` or `desc` |
+| `limit` | `int` | `5000` | 1–10000 |
+| `offset` | `int` | `0` | Pagination offset |
+
+Any unrecognized query parameter is treated as a **tag filter** (e.g. `?environment=production`).
+
+**Response:** `SpanRead[]` (200)
+
+---
+
+### Get Span Attachments
+
+```
+GET /spans/{span_id}/attachments
+```
+
+Returns all attachments (images, screenshots) for a span with presigned URLs.
+
+**Response:**
+
+```json
+{
+ "attachments": [
+ {
+ "type": "image",
+ "url": "https://...",
+ "label": "Homepage screenshot"
+ }
+ ],
+ "count": 1
+}
+```
+
+---
+
+## Traces
+
+### Create Traces
+
+```
+POST /traces
+```
+
+Create one or more traces. Traces are also auto-created when ingesting spans.
+
+**Request body:** `TraceCreate[]`
+
+| Field | Type | Required | Default | Description |
+| ------------- | --------------- | -------- | -------------- | ------------------------ |
+| `session_id` | `string (UUID)` | Yes | — | Parent session |
+| `name` | `string` | Yes | — | Trace name |
+| `started_at` | `ISO 8601` | Yes | — | Start time |
+| `id` | `string (UUID)` | No | auto-generated | Client-provided trace ID |
+| `status` | `string` | No | `null` | `unset`, `ok`, `error` |
+| `ended_at` | `ISO 8601` | No | `null` | End time |
+| `duration_ms` | `float` | No | `null` | Duration in milliseconds |
+| `cost` | `float` | No | `null` | Total cost |
+| `attributes` | `object` | No | `{}` | Arbitrary attributes |
+| `tags` | `object` | No | `{}` | Key-value tags |
+
+**Response:** `TraceRead[]` (200)
+
+---
+
+### Query Traces
+
+```
+GET /traces
+```
+
+| Parameter | Type | Default | Description |
+| ---------------- | ---------- | ------- | ------------------------------------------------------------------- |
+| `id` | `string` | — | Filter by trace ID |
+| `session_id` | `string` | — | Filter by session ID |
+| `status` | `string` | — | Filter by status |
+| `created_before` | `ISO 8601` | — | Upper bound on created_at |
+| `created_after` | `ISO 8601` | — | Lower bound on created_at |
+| `names` | `string` | — | Comma-separated trace names |
+| `has_error` | `bool` | — | `true` or `false` |
+| `duration_min` | `float` | — | Minimum duration (ms) |
+| `duration_max` | `float` | — | Maximum duration (ms) |
+| `cost_min` | `float` | — | Minimum cost |
+| `cost_max` | `float` | — | Maximum cost |
+| `sort_by` | `string` | — | `started_at`, `name`, `status`, `span_count`, `duration_ms`, `cost` |
+| `sort_order` | `string` | `desc` | `asc` or `desc` |
+| `limit` | `int` | `50` | 1–500 |
+| `offset` | `int` | `0` | Pagination offset |
+
+Tag filters work the same as for spans.
+
+**Response:** `TraceRead[]` (200)
+
+Each trace includes:
+
+| Field | Type | Description |
+| ------------------ | ---------- | ------------------------- |
+| `id` | `string` | Trace ID |
+| `session_id` | `string` | Parent session |
+| `name` | `string` | Trace name |
+| `status` | `string` | `unset`, `ok`, `error` |
+| `started_at` | `ISO 8601` | Start time |
+| `ended_at` | `ISO 8601` | End time |
+| `duration_ms` | `float` | Duration |
+| `cost` | `float` | Total cost |
+| `span_count` | `int` | Number of spans |
+| `root_span_input` | `string` | Input from the root span |
+| `root_span_output` | `string` | Output from the root span |
+| `tags` | `object` | Tags |
+
+---
+
+## Sessions
+
+### Create Sessions
+
+```
+POST /sessions
+```
+
+Create one or more sessions. Sessions are also auto-created when ingesting spans with a `session_id` or `session` field.
+
+**Request body:** `SessionCreate[]`
+
+| Field | Type | Required | Default | Description |
+| ------------ | --------------- | -------- | -------------- | --------------------------- |
+| `project_id` | `string` | Yes | — | Project ID |
+| `name` | `string` | No | `null` | Human-readable session name |
+| `id` | `string (UUID)` | No | auto-generated | Client-provided session ID |
+| `attributes` | `object` | No | `{}` | Arbitrary attributes |
+| `tags` | `object` | No | `{}` | Key-value tags |
+
+**Response:** `SessionRead[]` (200)
+
+---
+
+### Query Sessions
+
+```
+GET /sessions
+```
+
+| Parameter | Type | Default | Description |
+| ---------------- | ---------- | ------- | ------------------------------------------------------- |
+| `id` | `string` | — | Filter by session ID |
+| `created_before` | `ISO 8601` | — | Upper bound on created_at |
+| `created_after` | `ISO 8601` | — | Lower bound on created_at |
+| `names` | `string` | — | Comma-separated session names |
+| `has_error` | `bool` | — | `true` or `false` |
+| `duration_min` | `float` | — | Minimum duration |
+| `duration_max` | `float` | — | Maximum duration |
+| `cost_min` | `float` | — | Minimum cost |
+| `cost_max` | `float` | — | Maximum cost |
+| `sort_by` | `string` | — | `created_at`, `name`, `trace_count`, `cost`, `duration` |
+| `sort_order` | `string` | `desc` | `asc` or `desc` |
+| `limit` | `int` | `50` | 1–500 |
+| `offset` | `int` | `0` | Pagination offset |
+
+Tag filters work the same as for spans.
+
+**Response:** `SessionRead[]` (200)
+
+Each session includes:
+
+| Field | Type | Description |
+| ------------- | -------- | ---------------- |
+| `id` | `string` | Session ID |
+| `project_id` | `string` | Project ID |
+| `name` | `string` | Session name |
+| `trace_count` | `int` | Number of traces |
+| `error_count` | `int` | Number of errors |
+| `cost` | `float` | Total cost |
+| `duration` | `float` | Total duration |
+| `tags` | `object` | Tags |
+
+---
+
+## Feedback
+
+To retrieve feedback (human reviews and judge evaluations) for spans, traces, or sessions, use the [Feedback API](/feedback/api-reference#unified-entity-feedback).
+
+---
+
+## OpenTelemetry (OTLP)
+
+### Ingest OTLP Traces
+
+```
+POST /v1/traces
+```
+
+Accepts standard OpenTelemetry Protocol (OTLP) trace data in JSON or protobuf format.
+
+**Headers:**
+
+| Header | Value |
+| --------------- | ---------------------------------------------- |
+| `Authorization` | `Bearer YOUR_ZEROEVAL_API_KEY` |
+| `Content-Type` | `application/json` or `application/x-protobuf` |
+
+This endpoint is compatible with the OpenTelemetry Collector's `otlphttp` exporter. See [OpenTelemetry](/tracing/opentelemetry) for collector configuration.
diff --git a/tracing/introduction.mdx b/tracing/introduction.mdx
new file mode 100644
index 0000000..9b55260
--- /dev/null
+++ b/tracing/introduction.mdx
@@ -0,0 +1,65 @@
+---
+title: Introduction
+description: Capture every step your AI agent takes so you can debug, evaluate, and optimize
+---
+
+Your agent makes dozens of decisions every run -- retrieving context, calling models, executing tools, generating responses. Without observability, failures are invisible, regressions go unnoticed, and optimization is guesswork.
+
+ZeroEval Tracing captures the full execution graph of your AI system so you can:
+
+- **Debug** failed runs by inspecting the exact inputs, outputs, and errors at every step
+- **Evaluate** output quality at scale with [calibrated judges](/judges/introduction) that score your traces automatically
+- **Optimize** prompts and models by comparing versions against real production data with [prompt optimization](/autotune/introduction)
+- **Monitor** cost, latency, and error rates across sessions, traces, and spans
+
+## How it works
+
+
+
+ Add a few lines to your application. The SDK automatically captures LLM
+ calls, or you can create custom spans for any operation.
+
+
+ Every agent run becomes a trace -- a tree of spans showing what happened, in
+ what order, with full inputs and outputs.
+
+
+ Group related traces into sessions and tag them with metadata for filtering.
+ Attach [human feedback](/feedback/human-feedback) or let [judges](/judges/introduction) evaluate outputs automatically.
+
+
+ Use your traced data to run judges, optimize prompts, and build evaluations
+ -- all from the same production data.
+
+
+
+## Get started
+
+Create an API key from [Settings → API Keys](https://app.zeroeval.com/settings?section=api-keys), then pick your integration path:
+
+
+
+ Decorators and context managers for Python apps. Auto-instruments OpenAI,
+ LangChain, Gemini, and more.
+
+
+ Wrapper functions for Node.js and Bun. Auto-instruments OpenAI and Vercel AI
+ SDK.
+
+
+ Send spans, traces, and sessions directly over HTTP from any language.
+
+
+ Route OTLP traces from any OpenTelemetry-instrumented app to ZeroEval.
+
+
+
+
+ Using Cursor, Claude Code, or another coding agent? The [`zeroeval-install`
+ skill](/integrations/skills) can handle SDK setup, first trace, and prompt
+ migration for you.
+
diff --git a/tracing/manual-instrumentation.mdx b/tracing/manual-instrumentation.mdx
deleted file mode 100644
index 67186c3..0000000
--- a/tracing/manual-instrumentation.mdx
+++ /dev/null
@@ -1,1497 +0,0 @@
----
-title: Manual Instrumentation
-description: Create spans manually for LLM calls and custom operations
----
-
-This guide covers how to manually instrument your code to create spans, particularly for LLM operations. You'll learn how to use both the SDK and direct API calls to send trace data to ZeroEval.
-
-## SDK Manual Instrumentation
-
-### Basic LLM Span with SDK
-
-The simplest way to create an LLM span is using the SDK's span decorator or context manager:
-
-
-```python Python (Decorator)
-import zeroeval as ze
-import openai
-
-client = openai.OpenAI()
-
-@ze.span(name="chat_completion", kind="llm")
-def generate_response(messages: list) -> str:
- """Create an LLM span with automatic input/output capture"""
- response = client.chat.completions.create(
- model="gpt-4",
- messages=messages,
- temperature=0.7
- )
-
- # The SDK automatically captures function arguments as input
- # and return values as output
- return response.choices[0].message.content
-```
-
-```python Python (Context Manager)
-import zeroeval as ze
-import openai
-
-client = openai.OpenAI()
-
-def generate_response(messages: list) -> str:
- """Create an LLM span with manual control"""
- with ze.span(name="chat_completion", kind="llm") as span:
- # Set input data
- span.set_io(input_data=str(messages))
-
- # Make the API call
- response = client.chat.completions.create(
- model="gpt-4",
- messages=messages,
- temperature=0.7
- )
-
- # Set output data
- span.set_io(output_data=response.choices[0].message.content)
-
- # Add LLM-specific attributes
- span.set_attributes({
- "llm.model": "gpt-4",
- "llm.provider": "openai",
- "llm.input_tokens": response.usage.prompt_tokens,
- "llm.output_tokens": response.usage.completion_tokens,
- "llm.total_tokens": response.usage.total_tokens,
- "llm.temperature": 0.7
- })
-
- return response.choices[0].message.content
-```
-
-
-### Advanced LLM Span with Metrics
-
-For production use, capture comprehensive metrics for better observability:
-
-```python
-import zeroeval as ze
-import openai
-import time
-import json
-
-@ze.span(name="chat_completion_advanced", kind="llm")
-def generate_with_metrics(messages: list, **kwargs):
- """Create a comprehensive LLM span with all metrics"""
-
- # Get the current span to add attributes
- span = ze.get_current_span()
-
- # Track timing
- start_time = time.time()
- first_token_time = None
-
- # Prepare the request
- model = kwargs.get("model", "gpt-4")
- temperature = kwargs.get("temperature", 0.7)
- max_tokens = kwargs.get("max_tokens", None)
-
- # Set pre-request attributes
- span.set_attributes({
- "llm.model": model,
- "llm.provider": "openai",
- "llm.temperature": temperature,
- "llm.max_tokens": max_tokens,
- "llm.streaming": kwargs.get("stream", False)
- })
-
- # Store input messages in the expected format
- span.set_io(input_data=json.dumps([
- {"role": msg["role"], "content": msg["content"]}
- for msg in messages
- ]))
-
- try:
- client = openai.OpenAI()
-
- # Handle streaming responses
- if kwargs.get("stream", False):
- stream = client.chat.completions.create(
- model=model,
- messages=messages,
- temperature=temperature,
- max_tokens=max_tokens,
- stream=True
- )
-
- full_response = ""
- tokens = 0
-
- for chunk in stream:
- if chunk.choices[0].delta.content:
- if first_token_time is None:
- first_token_time = time.time()
- ttft_ms = (first_token_time - start_time) * 1000
- span.set_attributes({"llm.ttft_ms": ttft_ms})
-
- full_response += chunk.choices[0].delta.content
- tokens += 1
-
- # Calculate throughput
- total_time = time.time() - start_time
- span.set_attributes({
- "llm.output_tokens": tokens,
- "llm.throughput_tokens_per_sec": tokens / total_time if total_time > 0 else 0,
- "llm.duration_ms": total_time * 1000
- })
-
- span.set_io(output_data=full_response)
- return full_response
-
- else:
- # Non-streaming response
- response = client.chat.completions.create(
- model=model,
- messages=messages,
- temperature=temperature,
- max_tokens=max_tokens
- )
-
- # Capture all response metadata
- span.set_attributes({
- "llm.input_tokens": response.usage.prompt_tokens,
- "llm.output_tokens": response.usage.completion_tokens,
- "llm.total_tokens": response.usage.total_tokens,
- "llm.finish_reason": response.choices[0].finish_reason,
- "llm.system_fingerprint": response.system_fingerprint,
- "llm.response_id": response.id,
- "llm.duration_ms": (time.time() - start_time) * 1000
- })
-
- content = response.choices[0].message.content
- span.set_io(output_data=content)
-
- return content
-
- except Exception as e:
- # Capture error details
- span.set_status("error")
- span.set_attributes({
- "error.type": type(e).__name__,
- "error.message": str(e)
- })
- raise
-```
-
-## Provider-Specific Manual Instrumentation
-
-For users making direct API calls to OpenAI or Gemini without using the SDK's automatic instrumentation, here are comprehensive guides to properly instrument your calls with cost calculation and conversation formatting.
-
-### OpenAI API Manual Instrumentation
-
-When calling the OpenAI API directly (using `requests`, `httpx`, or similar), you'll want to capture all the metrics that the automatic integration would provide:
-
-
-```python Python (OpenAI Direct API)
-import requests
-import json
-import time
-import uuid
-from datetime import datetime, timezone
-
-class OpenAITracer:
-def **init**(self, api_key: str, zeroeval_api_key: str):
-self.openai_api_key = api_key
-self.zeroeval_api_key = zeroeval_api_key
-self.zeroeval_url = "https://api.zeroeval.com/api/v1/spans"
-
- def chat_completion_with_tracing(self, messages: list, model: str = "gpt-4o", **kwargs):
- """Make OpenAI API call with full ZeroEval instrumentation"""
-
- # Generate span identifiers
- trace_id = str(uuid.uuid4())
- span_id = str(uuid.uuid4())
-
- # Track timing
- start_time = time.time()
-
- # Prepare OpenAI request
- openai_payload = {
- "model": model,
- "messages": messages,
- **kwargs # temperature, max_tokens, etc.
- }
-
- # Add stream_options for token usage in streaming calls
- is_streaming = kwargs.get("stream", False)
- if is_streaming and "stream_options" not in kwargs:
- openai_payload["stream_options"] = {"include_usage": True}
-
- try:
- # Make the OpenAI API call
- response = requests.post(
- "https://api.openai.com/v1/chat/completions",
- headers={
- "Authorization": f"Bearer {self.openai_api_key}",
- "Content-Type": "application/json"
- },
- json=openai_payload,
- stream=is_streaming
- )
- response.raise_for_status()
-
- end_time = time.time()
- duration_ms = (end_time - start_time) * 1000
-
- if is_streaming:
- # Handle streaming response
- full_response = ""
- input_tokens = 0
- output_tokens = 0
- finish_reason = None
- response_id = None
- system_fingerprint = None
- first_token_time = None
-
- for line in response.iter_lines():
- if line:
- line = line.decode('utf-8')
- if line.startswith('data: '):
- data_str = line[6:]
- if data_str == '[DONE]':
- break
-
- try:
- data = json.loads(data_str)
-
- # Capture first token timing
- if data.get('choices') and data['choices'][0].get('delta', {}).get('content'):
- if first_token_time is None:
- first_token_time = time.time()
- full_response += data['choices'][0]['delta']['content']
-
- # Capture final metadata
- if 'usage' in data:
- input_tokens = data['usage']['prompt_tokens']
- output_tokens = data['usage']['completion_tokens']
-
- if data.get('choices') and data['choices'][0].get('finish_reason'):
- finish_reason = data['choices'][0]['finish_reason']
-
- if 'id' in data:
- response_id = data['id']
-
- if 'system_fingerprint' in data:
- system_fingerprint = data['system_fingerprint']
-
- except json.JSONDecodeError:
- continue
-
- # Send ZeroEval span for streaming
- self._send_span(
- span_id=span_id,
- trace_id=trace_id,
- model=model,
- messages=messages,
- response_text=full_response,
- input_tokens=input_tokens,
- output_tokens=output_tokens,
- duration_ms=duration_ms,
- start_time=start_time,
- finish_reason=finish_reason,
- response_id=response_id,
- system_fingerprint=system_fingerprint,
- streaming=True,
- first_token_time=first_token_time,
- **kwargs
- )
-
- return full_response
-
- else:
- # Handle non-streaming response
- response_data = response.json()
-
- # Extract response details
- content = response_data['choices'][0]['message']['content']
- usage = response_data.get('usage', {})
-
- # Send ZeroEval span
- self._send_span(
- span_id=span_id,
- trace_id=trace_id,
- model=model,
- messages=messages,
- response_text=content,
- input_tokens=usage.get('prompt_tokens', 0),
- output_tokens=usage.get('completion_tokens', 0),
- duration_ms=duration_ms,
- start_time=start_time,
- finish_reason=response_data['choices'][0].get('finish_reason'),
- response_id=response_data.get('id'),
- system_fingerprint=response_data.get('system_fingerprint'),
- streaming=False,
- **kwargs
- )
-
- return content
-
- except Exception as e:
- # Send error span
- end_time = time.time()
- duration_ms = (end_time - start_time) * 1000
-
- self._send_error_span(
- span_id=span_id,
- trace_id=trace_id,
- model=model,
- messages=messages,
- duration_ms=duration_ms,
- start_time=start_time,
- error=e,
- **kwargs
- )
- raise
-
- def _send_span(self, span_id: str, trace_id: str, model: str, messages: list,
- response_text: str, input_tokens: int, output_tokens: int,
- duration_ms: float, start_time: float, finish_reason: str = None,
- response_id: str = None, system_fingerprint: str = None,
- streaming: bool = False, first_token_time: float = None, **kwargs):
- """Send successful span to ZeroEval"""
-
- # Calculate throughput metrics
- throughput = output_tokens / (duration_ms / 1000) if duration_ms > 0 else 0
- ttft_ms = None
- if streaming and first_token_time:
- ttft_ms = (first_token_time - start_time) * 1000
-
- # Prepare span attributes following ZeroEval's expected format
- attributes = {
- # Core LLM attributes (these are used for cost calculation)
- "provider": "openai", # Key for cost calculation
- "model": model, # Key for cost calculation
- "inputTokens": input_tokens, # Key for cost calculation
- "outputTokens": output_tokens, # Key for cost calculation
-
- # OpenAI-specific attributes
- "temperature": kwargs.get("temperature"),
- "max_tokens": kwargs.get("max_tokens"),
- "top_p": kwargs.get("top_p"),
- "frequency_penalty": kwargs.get("frequency_penalty"),
- "presence_penalty": kwargs.get("presence_penalty"),
- "streaming": streaming,
- "finish_reason": finish_reason,
- "response_id": response_id,
- "system_fingerprint": system_fingerprint,
-
- # Performance metrics
- "throughput": throughput,
- "duration_ms": duration_ms,
- }
-
- if ttft_ms:
- attributes["ttft_ms"] = ttft_ms
-
- # Clean up None values
- attributes = {k: v for k, v in attributes.items() if v is not None}
-
- # Format messages for good conversation display
- formatted_messages = self._format_messages_for_display(messages)
-
- span_data = {
- "id": span_id,
- "trace_id": trace_id,
- "name": f"{model}_completion",
- "kind": "llm", # Critical: must be "llm" for cost calculation
- "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
- "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
- "status": "ok",
- "attributes": attributes,
- "input_data": json.dumps(formatted_messages),
- "output_data": response_text,
- "tags": {
- "provider": "openai",
- "model": model,
- "streaming": str(streaming).lower()
- }
- }
-
- # Send to ZeroEval
- response = requests.post(
- self.zeroeval_url,
- headers={
- "Authorization": f"Bearer {self.zeroeval_api_key}",
- "Content-Type": "application/json"
- },
- json=[span_data]
- )
-
- if response.status_code != 200:
- print(f"Warning: Failed to send span to ZeroEval: {response.text}")
-
- def _send_error_span(self, span_id: str, trace_id: str, model: str,
- messages: list, duration_ms: float, start_time: float,
- error: Exception, **kwargs):
- """Send error span to ZeroEval"""
-
- attributes = {
- "provider": "openai",
- "model": model,
- "temperature": kwargs.get("temperature"),
- "max_tokens": kwargs.get("max_tokens"),
- "streaming": kwargs.get("stream", False),
- "error_type": type(error).__name__,
- "error_message": str(error),
- "duration_ms": duration_ms,
- }
-
- # Clean up None values
- attributes = {k: v for k, v in attributes.items() if v is not None}
-
- formatted_messages = self._format_messages_for_display(messages)
-
- span_data = {
- "id": span_id,
- "trace_id": trace_id,
- "name": f"{model}_completion",
- "kind": "llm",
- "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
- "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
- "status": "error",
- "attributes": attributes,
- "input_data": json.dumps(formatted_messages),
- "output_data": "",
- "error_message": str(error),
- "tags": {
- "provider": "openai",
- "model": model,
- "error": "true"
- }
- }
-
- requests.post(
- self.zeroeval_url,
- headers={
- "Authorization": f"Bearer {self.zeroeval_api_key}",
- "Content-Type": "application/json"
- },
- json=[span_data]
- )
-
- def _format_messages_for_display(self, messages: list) -> list:
- """Format messages for optimal display in ZeroEval UI"""
- formatted = []
- for msg in messages:
- # Handle both dict and object formats
- if hasattr(msg, 'role'):
- role = msg.role
- content = msg.content
- else:
- role = msg.get('role', 'user')
- content = msg.get('content', '')
-
- # Handle multimodal content
- if isinstance(content, list):
- # Extract text parts for display
- text_parts = []
- for part in content:
- if isinstance(part, dict) and part.get('type') == 'text':
- text_parts.append(part['text'])
- elif isinstance(part, str):
- text_parts.append(part)
- content = '\n'.join(text_parts) if text_parts else '[Multimodal content]'
-
- formatted.append({
- "role": role,
- "content": content
- })
-
- return formatted
-
-# Usage example
-
-tracer = OpenAITracer(
-api_key="your-openai-api-key",
-zeroeval_api_key="your-zeroeval-api-key"
-)
-
-# Non-streaming call
-
-response = tracer.chat_completion_with_tracing([
-{"role": "user", "content": "What is the capital of France?"}
-], model="gpt-4o", temperature=0.7)
-
-# Streaming call
-
-response = tracer.chat_completion_with_tracing([
-{"role": "user", "content": "Write a short story"}
-], model="gpt-4o", stream=True, temperature=0.9)
-```
-
-
-
-### Gemini API Manual Instrumentation
-
-Gemini has a different API structure with `contents` instead of `messages` and different parameter names. Here's how to instrument Gemini API calls:
-
-
-```python Python (Gemini Direct API)
-import requests
-import json
-import time
-import uuid
-from datetime import datetime, timezone
-
-class GeminiTracer:
-def **init**(self, api_key: str, zeroeval_api_key: str):
-self.gemini_api_key = api_key
-self.zeroeval_api_key = zeroeval_api_key
-self.zeroeval_url = "https://api.zeroeval.com/api/v1/spans"
-
- def generate_content_with_tracing(self, messages: list, model: str = "gemini-1.5-flash", **kwargs):
- """Make Gemini API call with full ZeroEval instrumentation"""
-
- trace_id = str(uuid.uuid4())
- span_id = str(uuid.uuid4())
- start_time = time.time()
-
- # Convert OpenAI-style messages to Gemini contents format
- contents, system_instruction = self._convert_messages_to_contents(messages)
-
- # Prepare Gemini request payload
- gemini_payload = {
- "contents": contents
- }
-
- # Add generation config
- generation_config = {}
- if kwargs.get("temperature") is not None:
- generation_config["temperature"] = kwargs["temperature"]
- if kwargs.get("max_tokens"):
- generation_config["maxOutputTokens"] = kwargs["max_tokens"]
- if kwargs.get("top_p") is not None:
- generation_config["topP"] = kwargs["top_p"]
- if kwargs.get("top_k") is not None:
- generation_config["topK"] = kwargs["top_k"]
- if kwargs.get("stop"):
- stop = kwargs["stop"]
- generation_config["stopSequences"] = stop if isinstance(stop, list) else [stop]
-
- if generation_config:
- gemini_payload["generationConfig"] = generation_config
-
- # Add system instruction if present
- if system_instruction:
- gemini_payload["systemInstruction"] = {"parts": [{"text": system_instruction}]}
-
- # Add tools if provided
- if kwargs.get("tools"):
- gemini_payload["tools"] = kwargs["tools"]
- if kwargs.get("tool_choice"):
- gemini_payload["toolConfig"] = {
- "functionCallingConfig": {"mode": kwargs["tool_choice"]}
- }
-
- # Choose endpoint based on streaming
- is_streaming = kwargs.get("stream", False)
- endpoint = "streamGenerateContent" if is_streaming else "generateContent"
- url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:{endpoint}"
-
- try:
- response = requests.post(
- url,
- headers={
- "x-goog-api-key": self.gemini_api_key,
- "Content-Type": "application/json"
- },
- json=gemini_payload,
- stream=is_streaming
- )
- response.raise_for_status()
-
- end_time = time.time()
- duration_ms = (end_time - start_time) * 1000
-
- if is_streaming:
- # Handle streaming response
- full_response = ""
- input_tokens = 0
- output_tokens = 0
- finish_reason = None
- model_version = None
- first_token_time = None
-
- for line in response.iter_lines():
- if line:
- try:
- # Gemini streaming sends JSON objects separated by newlines
- data = json.loads(line.decode('utf-8'))
-
- if 'candidates' in data and data['candidates']:
- candidate = data['candidates'][0]
-
- # Extract content
- if 'content' in candidate and 'parts' in candidate['content']:
- for part in candidate['content']['parts']:
- if 'text' in part:
- if first_token_time is None:
- first_token_time = time.time()
- full_response += part['text']
-
- # Extract finish reason
- if 'finishReason' in candidate:
- finish_reason = candidate['finishReason']
-
- # Extract usage metadata (usually in final chunk)
- if 'usageMetadata' in data:
- usage = data['usageMetadata']
- input_tokens = usage.get('promptTokenCount', 0)
- output_tokens = usage.get('candidatesTokenCount', 0)
-
- # Extract model version
- if 'modelVersion' in data:
- model_version = data['modelVersion']
-
- except json.JSONDecodeError:
- continue
-
- self._send_span(
- span_id=span_id, trace_id=trace_id, model=model,
- original_messages=messages, response_text=full_response,
- input_tokens=input_tokens, output_tokens=output_tokens,
- duration_ms=duration_ms, start_time=start_time,
- finish_reason=finish_reason, model_version=model_version,
- streaming=True, first_token_time=first_token_time,
- **kwargs
- )
-
- return full_response
-
- else:
- # Handle non-streaming response
- response_data = response.json()
-
- # Extract response content
- content = ""
- if 'candidates' in response_data and response_data['candidates']:
- candidate = response_data['candidates'][0]
- if 'content' in candidate and 'parts' in candidate['content']:
- content_parts = []
- for part in candidate['content']['parts']:
- if 'text' in part:
- content_parts.append(part['text'])
- content = ''.join(content_parts)
-
- # Extract usage
- usage = response_data.get('usageMetadata', {})
- input_tokens = usage.get('promptTokenCount', 0)
- output_tokens = usage.get('candidatesTokenCount', 0)
-
- # Extract other metadata
- finish_reason = None
- if 'candidates' in response_data and response_data['candidates']:
- finish_reason = response_data['candidates'][0].get('finishReason')
-
- model_version = response_data.get('modelVersion')
-
- self._send_span(
- span_id=span_id, trace_id=trace_id, model=model,
- original_messages=messages, response_text=content,
- input_tokens=input_tokens, output_tokens=output_tokens,
- duration_ms=duration_ms, start_time=start_time,
- finish_reason=finish_reason, model_version=model_version,
- streaming=False, **kwargs
- )
-
- return content
-
- except Exception as e:
- end_time = time.time()
- duration_ms = (end_time - start_time) * 1000
-
- self._send_error_span(
- span_id=span_id, trace_id=trace_id, model=model,
- original_messages=messages, duration_ms=duration_ms,
- start_time=start_time, error=e, **kwargs
- )
- raise
-
- def _convert_messages_to_contents(self, messages: list) -> tuple:
- """Convert OpenAI-style messages to Gemini contents format"""
- contents = []
- system_instruction = None
-
- for msg in messages:
- role = msg.get('role', 'user') if isinstance(msg, dict) else msg.role
- content = msg.get('content', '') if isinstance(msg, dict) else msg.content
-
- if role == 'system':
- # Collect system instructions
- if system_instruction:
- system_instruction += f"\n{content}"
- else:
- system_instruction = content
- continue
-
- # Convert content to parts
- if isinstance(content, list):
- # Handle multimodal content
- parts = []
- for item in content:
- if isinstance(item, dict) and item.get('type') == 'text':
- parts.append({"text": item['text']})
- # Add support for images, etc. if needed
- else:
- parts = [{"text": str(content)}]
-
- # Convert role
- gemini_role = "user" if role == "user" else "model"
- contents.append({"role": gemini_role, "parts": parts})
-
- return contents, system_instruction
-
- def _send_span(self, span_id: str, trace_id: str, model: str,
- original_messages: list, response_text: str,
- input_tokens: int, output_tokens: int, duration_ms: float,
- start_time: float, finish_reason: str = None,
- model_version: str = None, streaming: bool = False,
- first_token_time: float = None, **kwargs):
- """Send successful span to ZeroEval"""
-
- # Calculate performance metrics
- throughput = output_tokens / (duration_ms / 1000) if duration_ms > 0 else 0
- ttft_ms = None
- if streaming and first_token_time:
- ttft_ms = (first_token_time - start_time) * 1000
-
- # Prepare attributes following ZeroEval's expected format
- attributes = {
- # Core attributes for cost calculation (use provider naming)
- "provider": "gemini", # Key for cost calculation
- "model": model, # Key for cost calculation
- "inputTokens": input_tokens, # Key for cost calculation
- "outputTokens": output_tokens, # Key for cost calculation
-
- # Gemini-specific attributes
- "temperature": kwargs.get("temperature"),
- "max_tokens": kwargs.get("max_tokens"), # maxOutputTokens
- "top_p": kwargs.get("top_p"),
- "top_k": kwargs.get("top_k"),
- "stop_sequences": kwargs.get("stop"),
- "streaming": streaming,
- "finish_reason": finish_reason,
- "model_version": model_version,
-
- # Performance metrics
- "throughput": throughput,
- "duration_ms": duration_ms,
- }
-
- if ttft_ms:
- attributes["ttft_ms"] = ttft_ms
-
- # Include tool information if present
- if kwargs.get("tools"):
- attributes["tools_count"] = len(kwargs["tools"])
- attributes["tool_choice"] = kwargs.get("tool_choice")
-
- # Clean up None values
- attributes = {k: v for k, v in attributes.items() if v is not None}
-
- # Format original messages for display (convert back to OpenAI format for consistency)
- formatted_messages = self._format_messages_for_display(original_messages)
-
- span_data = {
- "id": span_id,
- "trace_id": trace_id,
- "name": f"{model}_completion",
- "kind": "llm", # Critical: must be "llm" for cost calculation
- "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
- "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
- "status": "ok",
- "attributes": attributes,
- "input_data": json.dumps(formatted_messages),
- "output_data": response_text,
- "tags": {
- "provider": "gemini",
- "model": model,
- "streaming": str(streaming).lower()
- }
- }
-
- # Send to ZeroEval
- response = requests.post(
- self.zeroeval_url,
- headers={
- "Authorization": f"Bearer {self.zeroeval_api_key}",
- "Content-Type": "application/json"
- },
- json=[span_data]
- )
-
- if response.status_code != 200:
- print(f"Warning: Failed to send span to ZeroEval: {response.text}")
-
- def _send_error_span(self, span_id: str, trace_id: str, model: str,
- original_messages: list, duration_ms: float,
- start_time: float, error: Exception, **kwargs):
- """Send error span to ZeroEval"""
-
- attributes = {
- "provider": "gemini",
- "model": model,
- "temperature": kwargs.get("temperature"),
- "max_tokens": kwargs.get("max_tokens"),
- "streaming": kwargs.get("stream", False),
- "error_type": type(error).__name__,
- "error_message": str(error),
- "duration_ms": duration_ms,
- }
-
- # Clean up None values
- attributes = {k: v for k, v in attributes.items() if v is not None}
-
- formatted_messages = self._format_messages_for_display(original_messages)
-
- span_data = {
- "id": span_id,
- "trace_id": trace_id,
- "name": f"{model}_completion",
- "kind": "llm",
- "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
- "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
- "status": "error",
- "attributes": attributes,
- "input_data": json.dumps(formatted_messages),
- "output_data": "",
- "error_message": str(error),
- "tags": {
- "provider": "gemini",
- "model": model,
- "error": "true"
- }
- }
-
- requests.post(
- self.zeroeval_url,
- headers={
- "Authorization": f"Bearer {self.zeroeval_api_key}",
- "Content-Type": "application/json"
- },
- json=[span_data]
- )
-
- def _format_messages_for_display(self, messages: list) -> list:
- """Format messages for optimal display in ZeroEval UI"""
- formatted = []
- for msg in messages:
- if hasattr(msg, 'role'):
- role = msg.role
- content = msg.content
- else:
- role = msg.get('role', 'user')
- content = msg.get('content', '')
-
- # Handle multimodal content
- if isinstance(content, list):
- text_parts = []
- for part in content:
- if isinstance(part, dict) and part.get('type') == 'text':
- text_parts.append(part['text'])
- elif isinstance(part, str):
- text_parts.append(part)
- content = '\n'.join(text_parts) if text_parts else '[Multimodal content]'
-
- formatted.append({
- "role": role,
- "content": content
- })
-
- return formatted
-
-# Usage example
-
-tracer = GeminiTracer(
-api_key="your-gemini-api-key",
-zeroeval_api_key="your-zeroeval-api-key"
-)
-
-# Non-streaming call
-
-response = tracer.generate_content_with_tracing([
-{"role": "user", "content": "What is the capital of France?"}
-], model="gemini-1.5-flash", temperature=0.7)
-
-# Streaming call
-
-response = tracer.generate_content_with_tracing([
-{"role": "system", "content": "You are a helpful assistant."},
-{"role": "user", "content": "Write a short story"}
-], model="gemini-1.5-flash", stream=True, temperature=0.9)
-
-````
-
-
-
-### Key Attributes for Cost Calculation
-
-For accurate cost calculation, ZeroEval requires these specific attributes in your span:
-
-| Attribute | Required | Description | Example Values |
-|-----------|----------|-------------|---------------|
-| `provider` | ✅ | Provider identifier for pricing lookup | `"openai"`, `"gemini"`, `"anthropic"` |
-| `model` | ✅ | Model identifier for pricing lookup | `"gpt-4o"`, `"gemini-1.5-flash"` |
-| `inputTokens` | ✅ | Number of input tokens consumed | `150` |
-| `outputTokens` | ✅ | Number of output tokens generated | `75` |
-| `kind` | ✅ | Must be set to `"llm"` | `"llm"` |
-
-**Cost Calculation Process:**
-
-1. ZeroEval looks up pricing in the `provider_models` table using `provider` and `model`
-2. Calculates: `(inputTokens × inputPrice + outputTokens × outputPrice) / 1,000,000`
-3. Stores the result in the span's `cost` field
-4. Cost is displayed in cents, automatically converted to dollars in the UI
-
-**Current Supported Models for Cost Calculation:**
-
-- **OpenAI**: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo`
-- **Gemini**: `gemini-1.5-flash`, `gemini-1.5-pro`, `gemini-1.0-pro`
-- **Anthropic**: `claude-3-5-sonnet`, `claude-3-haiku`, `claude-3-opus`
-
-If your model isn't listed, the cost will be `0` and you'll see a warning in the logs. Contact support to add pricing for new models.
-
-### Conversation Formatting Best Practices
-
-To ensure your conversations display properly in the ZeroEval UI, follow these formatting guidelines:
-
-
-```python Python Message Formatting
-def format_messages_for_zeroeval(messages: list) -> list:
- """Format messages for optimal display in ZeroEval UI"""
- formatted = []
-
- for msg in messages:
- # Handle both dict and object formats
- if hasattr(msg, 'role'):
- role = msg.role
- content = msg.content
- else:
- role = msg.get('role', 'user')
- content = msg.get('content', '')
-
- # Standardize role names
- if role in ['assistant', 'bot', 'ai']:
- role = 'assistant'
- elif role in ['human', 'user']:
- role = 'user'
- elif role == 'system':
- role = 'system'
-
- # Handle multimodal content - extract text for display
- if isinstance(content, list):
- text_parts = []
- for part in content:
- if isinstance(part, dict):
- if part.get('type') == 'text':
- text_parts.append(part['text'])
- elif part.get('type') == 'image_url':
- text_parts.append(f"[Image: {part.get('image_url', {}).get('url', 'Unknown')}]")
- elif isinstance(part, str):
- text_parts.append(part)
-
- # Join text parts with newlines for readability
- content = '\n'.join(text_parts) if text_parts else '[Multimodal content]'
-
- # Ensure content is a string
- if not isinstance(content, str):
- content = str(content)
-
- # Trim excessive whitespace but preserve meaningful formatting
- content = content.strip()
-
- formatted.append({
- "role": role,
- "content": content
- })
-
- return formatted
-
-# Usage in span creation
-span_data = {
- "input_data": json.dumps(format_messages_for_zeroeval(original_messages)),
- "output_data": response_text.strip(), # Clean response text too
- # ... other fields
-}
-```
-
-
-
-**Key Formatting Rules:**
-
-1. **Standardize Role Names**: Use `"user"`, `"assistant"`, and `"system"` consistently
-2. **Handle Multimodal Content**: Extract text content and add descriptive placeholders for non-text elements
-3. **Clean Whitespace**: Trim excessive whitespace while preserving intentional formatting
-4. **Ensure String Types**: Convert all content to strings to avoid serialization issues
-5. **Preserve Conversation Flow**: Maintain the original message order and context
-
-**UI Display Features:**
-
-- **Message Bubbles**: Conversations appear as chat bubbles with clear role distinction
-- **Token Counts**: Hover over messages to see token usage breakdown
-- **Copy Functionality**: Users can copy individual messages or entire conversations
-- **Search**: Well-formatted messages are easily searchable within traces
-- **Export**: Clean formatting ensures readable exports to various formats
-
-**Common Formatting Issues to Avoid:**
-
-- ❌ Mixed role naming (`bot` vs `assistant`)
-- ❌ Nested objects in content fields
-- ❌ Excessive line breaks or whitespace
-- ❌ Empty or null content fields
-- ❌ Non-string data types in content
-
-**Pro Tips:**
-
-- Keep system messages concise but informative
-- Use consistent formatting across your application
-- Include relevant context in message content for better debugging
-- Consider truncating very long messages (>10k characters) with ellipsis
-
-### Creating Child Spans
-
-Create nested spans to track sub-operations within an LLM call:
-
-```python
-import zeroeval as ze
-
-@ze.span(name="rag_pipeline", kind="generic")
-def answer_with_context(question: str) -> str:
- # Retrieval step
- with ze.span(name="retrieve_context", kind="vector_store") as retrieval_span:
- context = vector_db.search(question, k=5)
- retrieval_span.set_attributes({
- "vector_store.query": question,
- "vector_store.k": 5,
- "vector_store.results": len(context)
- })
-
- # LLM generation step
- with ze.span(name="generate_answer", kind="llm") as llm_span:
- messages = [
- {"role": "system", "content": f"Context: {context}"},
- {"role": "user", "content": question}
- ]
-
- response = generate_response(messages)
-
- llm_span.set_attributes({
- "llm.model": "gpt-4",
- "llm.context_length": len(str(context))
- })
-
- return response
-```
-
-## Direct API Instrumentation
-
-If you prefer to send spans directly to the API without using an SDK, here's how to do it:
-
-### API Authentication
-
-First, obtain an API key from your [Settings → API Keys](https://app.zeroeval.com/settings?section=api-keys) page.
-
-Include the API key in your request headers:
-
-```bash
-Authorization: Bearer YOUR_API_KEY
-```
-
-### Basic Span Creation
-
-Send a POST request to `/api/v1/spans` with your span data:
-
-
-```bash cURL
-curl -X POST https://api.zeroeval.com/api/v1/spans \
- -H "Authorization: Bearer YOUR_API_KEY" \
- -H "Content-Type: application/json" \
- -d '[{
- "id": "550e8400-e29b-41d4-a716-446655440000",
- "trace_id": "550e8400-e29b-41d4-a716-446655440001",
- "name": "chat_completion",
- "kind": "llm",
- "started_at": "2024-01-15T10:30:00Z",
- "ended_at": "2024-01-15T10:30:02Z",
- "status": "ok",
- "attributes": {
- "llm.model": "gpt-4",
- "llm.provider": "openai",
- "llm.temperature": 0.7,
- "llm.input_tokens": 150,
- "llm.output_tokens": 230,
- "llm.total_tokens": 380
- },
- "input_data": "[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}]",
- "output_data": "The capital of France is Paris."
- }]'
-```
-
-```python Python (Requests)
-import requests
-import json
-from datetime import datetime, timezone
-import uuid
-
-def send_llm_span(messages, response_text, model="gpt-4", tokens=None):
- """Send an LLM span directly to the ZeroEval API"""
-
- # Generate IDs
- span_id = str(uuid.uuid4())
- trace_id = str(uuid.uuid4())
-
- # Prepare the span data
- span_data = {
- "id": span_id,
- "trace_id": trace_id,
- "name": "chat_completion",
- "kind": "llm",
- "started_at": datetime.now(timezone.utc).isoformat(),
- "ended_at": datetime.now(timezone.utc).isoformat(),
- "status": "ok",
- "attributes": {
- "llm.model": model,
- "llm.provider": "openai",
- "llm.temperature": 0.7
- },
- "input_data": json.dumps(messages),
- "output_data": response_text
- }
-
- # Add token counts if provided
- if tokens:
- span_data["attributes"].update({
- "llm.input_tokens": tokens.get("prompt_tokens"),
- "llm.output_tokens": tokens.get("completion_tokens"),
- "llm.total_tokens": tokens.get("total_tokens")
- })
-
- # Send to API
- response = requests.post(
- "https://api.zeroeval.com/api/v1/spans",
- headers={
- "Authorization": f"Bearer {YOUR_API_KEY}",
- "Content-Type": "application/json"
- },
- json=[span_data] # Note: API expects an array
- )
-
- if response.status_code == 200:
- return response.json()
- else:
- raise Exception(f"Failed to send span: {response.text}")
-```
-
-
-
-### Complete LLM Span with Session
-
-Create a full trace with session context:
-
-```python
-import requests
-import json
-from datetime import datetime, timezone
-import uuid
-import time
-
-class ZeroEvalClient:
- def __init__(self, api_key: str):
- self.api_key = api_key
- self.base_url = "https://api.zeroeval.com/api/v1"
- self.session_id = str(uuid.uuid4())
-
- def create_llm_span(
- self,
- messages: list,
- response: dict,
- model: str = "gpt-4",
- trace_id: str = None,
- parent_span_id: str = None,
- start_time: float = None,
- end_time: float = None
- ):
- """Create a comprehensive LLM span with all metadata"""
-
- if not trace_id:
- trace_id = str(uuid.uuid4())
-
- if not start_time:
- start_time = time.time()
- if not end_time:
- end_time = time.time()
-
- span_id = str(uuid.uuid4())
-
- # Calculate duration
- duration_ms = (end_time - start_time) * 1000
-
- # Prepare comprehensive span data
- span_data = {
- "id": span_id,
- "trace_id": trace_id,
- "parent_span_id": parent_span_id,
- "name": f"{model}_completion",
- "kind": "llm",
- "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
- "ended_at": datetime.fromtimestamp(end_time, timezone.utc).isoformat(),
- "duration_ms": duration_ms,
- "status": "ok",
-
- # Session context
- "session": {
- "id": self.session_id,
- "name": "API Client Session"
- },
-
- # Core attributes
- "attributes": {
- "llm.model": model,
- "llm.provider": "openai",
- "llm.temperature": 0.7,
- "llm.max_tokens": 1000,
- "llm.streaming": False,
-
- # Token metrics
- "llm.input_tokens": response.get("usage", {}).get("prompt_tokens"),
- "llm.output_tokens": response.get("usage", {}).get("completion_tokens"),
- "llm.total_tokens": response.get("usage", {}).get("total_tokens"),
-
- # Performance metrics
- "llm.duration_ms": duration_ms,
- "llm.throughput_tokens_per_sec": (
- response.get("usage", {}).get("completion_tokens", 0) /
- (duration_ms / 1000) if duration_ms > 0 else 0
- ),
-
- # Response metadata
- "llm.finish_reason": response.get("choices", [{}])[0].get("finish_reason"),
- "llm.response_id": response.get("id"),
- "llm.system_fingerprint": response.get("system_fingerprint")
- },
-
- # Tags for filtering
- "tags": {
- "environment": "production",
- "version": "1.0.0",
- "user_id": "user_123"
- },
-
- # Input/Output
- "input_data": json.dumps(messages),
- "output_data": response.get("choices", [{}])[0].get("message", {}).get("content", ""),
-
- # Cost calculation (optional - will be calculated server-side if not provided)
- "cost": self.calculate_cost(
- model,
- response.get("usage", {}).get("prompt_tokens", 0),
- response.get("usage", {}).get("completion_tokens", 0)
- )
- }
-
- # Send the span
- response = requests.post(
- f"{self.base_url}/spans",
- headers={
- "Authorization": f"Bearer {self.api_key}",
- "Content-Type": "application/json"
- },
- json=[span_data]
- )
-
- if response.status_code != 200:
- raise Exception(f"Failed to send span: {response.text}")
-
- return span_id
-
- def calculate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
- """Calculate cost based on model and token usage"""
- # Example pricing (adjust based on actual pricing)
- pricing = {
- "gpt-4": {"input": 0.03 / 1000, "output": 0.06 / 1000},
- "gpt-3.5-turbo": {"input": 0.001 / 1000, "output": 0.002 / 1000}
- }
-
- if model in pricing:
- input_cost = input_tokens * pricing[model]["input"]
- output_cost = output_tokens * pricing[model]["output"]
- return input_cost + output_cost
-
- return 0.0
-```
-
-## Span Schema Reference
-
-### Required Fields
-
-| Field | Type | Description |
-| ------------ | ----------------- | ------------------------------- |
-| `trace_id` | string (UUID) | Unique identifier for the trace |
-| `name` | string | Descriptive name for the span |
-| `started_at` | ISO 8601 datetime | When the span started |
-
-### Recommended Fields for LLM Spans
-
-| Field | Type | Description |
-| ------------- | ----------------- | ------------------------------------------------------- |
-| `id` | string (UUID) | Unique span identifier (auto-generated if not provided) |
-| `kind` | string | Set to `"llm"` for LLM spans |
-| `ended_at` | ISO 8601 datetime | When the span completed |
-| `status` | string | `"ok"`, `"error"`, or `"unset"` |
-| `input_data` | string | JSON string of input messages |
-| `output_data` | string | Generated text response |
-| `duration_ms` | number | Total duration in milliseconds |
-| `cost` | number | Calculated cost (auto-calculated if not provided) |
-
-### LLM-Specific Attributes
-
-Store these in the `attributes` field:
-
-| Attribute | Type | Description |
-| ------------------------------- | ------- | -------------------------------------------- |
-| `llm.model` | string | Model identifier (e.g., "gpt-4", "claude-3") |
-| `llm.provider` | string | Provider name (e.g., "openai", "anthropic") |
-| `llm.temperature` | number | Temperature parameter |
-| `llm.max_tokens` | number | Maximum tokens limit |
-| `llm.input_tokens` | number | Number of input tokens |
-| `llm.output_tokens` | number | Number of output tokens |
-| `llm.total_tokens` | number | Total tokens used |
-| `llm.streaming` | boolean | Whether response was streamed |
-| `llm.ttft_ms` | number | Time to first token (streaming only) |
-| `llm.throughput_tokens_per_sec` | number | Token generation rate |
-| `llm.finish_reason` | string | Why generation stopped |
-| `llm.response_id` | string | Provider's response ID |
-| `llm.system_fingerprint` | string | Model version identifier |
-
-### Optional Context Fields
-
-| Field | Type | Description |
-| ---------------- | ------------- | --------------------------------------------- |
-| `parent_span_id` | string (UUID) | Parent span for nested operations |
-| `session` | object | Session context with `id` and optional `name` |
-| `tags` | object | Key-value pairs for filtering |
-| `signals` | object | Custom signals for alerting |
-| `error_message` | string | Error description if status is "error" |
-| `error_stack` | string | Stack trace for debugging |
-
-## Best Practices
-
-1. **Always set the `kind` field**: Use `"llm"` for LLM spans to enable specialized features like embeddings and cost tracking.
-
-2. **Include token counts**: These are essential for cost calculation and performance monitoring.
-
-3. **Capture timing metrics**: For streaming responses, track TTFT (time to first token) and throughput.
-
-4. **Use consistent naming**: Follow a pattern like `{model}_completion` or `{provider}_{operation}`.
-
-5. **Add context with tags**: Use tags for environment, version, user ID, etc., to enable powerful filtering.
-
-6. **Handle errors gracefully**: Set status to "error" and include error details in attributes.
-
-7. **Link related spans**: Use `parent_span_id` to create hierarchical traces for complex workflows.
-
-8. **Batch span submissions**: When sending multiple spans, include them in a single API call as an array.
-
-## Examples
-
-### Multi-Step LLM Pipeline
-
-Here's a complete example of tracking a RAG (Retrieval-Augmented Generation) pipeline:
-
-```python
-import zeroeval as ze
-import time
-import json
-
-@ze.span(name="rag_query", kind="generic")
-def rag_pipeline(user_query: str) -> dict:
- trace_id = ze.get_current_trace()
-
- # Step 1: Query embedding
- with ze.span(name="embed_query", kind="llm") as embed_span:
- start = time.time()
- embedding = create_embedding(user_query)
- embed_span.set_attributes({
- "llm.model": "text-embedding-3-small",
- "llm.provider": "openai",
- "llm.input_tokens": len(user_query.split()),
- "llm.duration_ms": (time.time() - start) * 1000
- })
-
- # Step 2: Vector search
- with ze.span(name="vector_search", kind="vector_store") as search_span:
- results = vector_db.similarity_search(embedding, k=5)
- search_span.set_attributes({
- "vector_store.index": "knowledge_base",
- "vector_store.k": 5,
- "vector_store.results_count": len(results)
- })
-
- # Step 3: Rerank results
- with ze.span(name="rerank_results", kind="llm") as rerank_span:
- reranked = rerank_documents(user_query, results)
- rerank_span.set_attributes({
- "llm.model": "rerank-english-v2.0",
- "llm.provider": "cohere",
- "rerank.input_documents": len(results),
- "rerank.output_documents": len(reranked)
- })
-
- # Step 4: Generate response
- with ze.span(name="generate_response", kind="llm") as gen_span:
- context = "\n".join([doc.content for doc in reranked[:3]])
- messages = [
- {"role": "system", "content": f"Use this context to answer: {context}"},
- {"role": "user", "content": user_query}
- ]
-
- response = generate_with_metrics(messages, model="gpt-4")
-
- gen_span.set_attributes({
- "llm.context_documents": 3,
- "llm.context_length": len(context)
- })
-
- return {
- "answer": response,
- "sources": [doc.metadata for doc in reranked[:3]],
- "trace_id": trace_id
- }
-```
-
-This comprehensive instrumentation provides full visibility into your LLM operations, enabling you to monitor performance, track costs, and debug issues effectively.
-
-## Next Steps
-
-
-
- Complete guide to environment variables, initialization parameters, and
- runtime configuration options.
-
-
- Automatic instrumentation for popular LLM libraries without manual code
- changes.
-
-
-
-
- For automatic instrumentation of popular LLM libraries, check out our [SDK
- integrations](/tracing/sdks/python/integrations) which handle all of this
- automatically.
-
diff --git a/tracing/opentelemetry.mdx b/tracing/opentelemetry.mdx
index 7bcead6..348c5b9 100644
--- a/tracing/opentelemetry.mdx
+++ b/tracing/opentelemetry.mdx
@@ -1,19 +1,47 @@
---
title: OpenTelemetry
-description: Send traces to ZeroEval using the OpenTelemetry collector
+description: Send traces to ZeroEval via the OpenTelemetry Protocol (OTLP)
---
-ZeroEval provides native support for the OpenTelemetry Protocol (OTLP), allowing you to send traces from any OpenTelemetry-instrumented application directly to ZeroEval's API. This guide shows you how to configure the OpenTelemetry collector to export traces to ZeroEval.
+ZeroEval accepts standard OTLP trace data at `POST /v1/traces`, so any OpenTelemetry-instrumented application can export directly -- through a collector, or straight from your app using an OTLP exporter.
+## When to use OTLP
-## Prerequisites
+Use this integration when:
-- A ZeroEval API key (get one from your [workspace settings](https://app.zeroeval.com/settings/api-keys))
-- OpenTelemetry collector installed ([installation guide](https://opentelemetry.io/docs/collector/getting-started/))
+- Your application already uses OpenTelemetry for instrumentation
+- You want to fan out traces to multiple backends (ZeroEval + Datadog, Jaeger, etc.)
+- You're running infrastructure you can't modify but can route through a collector
+- You prefer a vendor-neutral instrumentation layer
-## Configuration
+
+ If you're starting fresh, the [Python SDK](/tracing/sdks/python/setup) or
+ [TypeScript SDK](/tracing/sdks/typescript/setup) provide a simpler setup with
+ automatic LLM instrumentation.
+
-Create a collector configuration file (`otel-collector-config.yaml`):
+## Endpoint Reference
+
+```
+POST https://api.zeroeval.com/v1/traces
+```
+
+| Header | Value |
+| --------------- | ---------------------------------------------- |
+| `Authorization` | `Bearer YOUR_ZEROEVAL_API_KEY` |
+| `Content-Type` | `application/json` or `application/x-protobuf` |
+
+The endpoint accepts the standard `ExportTraceServiceRequest` payload defined in the [OTLP specification](https://opentelemetry.io/docs/specs/otlp/). Spans are converted to ZeroEval's internal format -- trace IDs, parent-child relationships, attributes, and status are all preserved.
+
+---
+
+## Option 1: OpenTelemetry Collector
+
+Route traces through a collector when you need batching, processing, or multi-destination fan-out.
+
+### Collector Configuration
+
+Create `otel-collector-config.yaml`:
```yaml
receivers:
@@ -29,73 +57,41 @@ processors:
timeout: 1s
send_batch_size: 1024
- # ZeroEval-specific attributes
- attributes:
- actions:
- - key: deployment.environment
- value: "production" # or staging, development, etc.
- action: upsert
-
exporters:
otlphttp:
endpoint: https://api.zeroeval.com
headers:
- Authorization: "Bearer YOUR_ZEROEVAL_API_KEY"
+ Authorization: "Bearer ${env:ZEROEVAL_API_KEY}"
traces_endpoint: https://api.zeroeval.com/v1/traces
service:
pipelines:
traces:
receivers: [otlp]
- processors: [batch, attributes]
+ processors: [batch]
exporters: [otlphttp]
```
-## Docker Deployment
-
-For containerized deployments, use this Docker Compose configuration:
+### Run with Docker Compose
```yaml
-version: '3.8'
-
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
- container_name: otel-collector
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- - "4317:4317" # OTLP gRPC receiver
- - "4318:4318" # OTLP HTTP receiver
- - "8888:8888" # Prometheus metrics
+ - "4317:4317"
+ - "4318:4318"
environment:
- ZEROEVAL_API_KEY=${ZEROEVAL_API_KEY}
restart: unless-stopped
```
-## Environment-based Configuration
-
-To avoid hardcoding sensitive information, use environment variables:
-
-```yaml
-exporters:
- otlphttp:
- endpoint: https://api.zeroeval.com
- headers:
- Authorization: "Bearer ${env:ZEROEVAL_API_KEY}"
- traces_endpoint: https://api.zeroeval.com/v1/traces
-```
-
-Then set the environment variable:
-
-```bash
-export ZEROEVAL_API_KEY="your-api-key-here"
-```
-
-## Kubernetes Deployment
+### Run with Kubernetes
-For Kubernetes environments, use this ConfigMap and Deployment:
+
```yaml
apiVersion: v1
@@ -103,7 +99,7 @@ kind: ConfigMap
metadata:
name: otel-collector-config
data:
- otel-collector-config.yaml: |
+ config.yaml: |
receivers:
otlp:
protocols:
@@ -111,32 +107,35 @@ data:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
-
processors:
batch:
timeout: 1s
-
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.pod.name
-
exporters:
otlphttp:
endpoint: https://api.zeroeval.com
headers:
Authorization: "Bearer ${env:ZEROEVAL_API_KEY}"
traces_endpoint: https://api.zeroeval.com/v1/traces
-
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, k8sattributes]
exporters: [otlphttp]
-
+---
+apiVersion: v1
+kind: Secret
+metadata:
+ name: zeroeval-secret
+type: Opaque
+stringData:
+ api-key: "YOUR_ZEROEVAL_API_KEY"
---
apiVersion: apps/v1
kind: Deployment
@@ -153,29 +152,28 @@ spec:
app: otel-collector
spec:
containers:
- - name: otel-collector
- image: otel/opentelemetry-collector-contrib:latest
- args: ["--config=/etc/otel-collector-config.yaml"]
- env:
- - name: ZEROEVAL_API_KEY
- valueFrom:
- secretKeyRef:
- name: zeroeval-secret
- key: api-key
- ports:
- - containerPort: 4317
- name: otlp-grpc
- - containerPort: 4318
- name: otlp-http
- volumeMounts:
- - name: config
- mountPath: /etc/otel-collector-config.yaml
- subPath: otel-collector-config.yaml
+ - name: otel-collector
+ image: otel/opentelemetry-collector-contrib:latest
+ args: ["--config=/etc/config.yaml"]
+ env:
+ - name: ZEROEVAL_API_KEY
+ valueFrom:
+ secretKeyRef:
+ name: zeroeval-secret
+ key: api-key
+ ports:
+ - containerPort: 4317
+ name: otlp-grpc
+ - containerPort: 4318
+ name: otlp-http
+ volumeMounts:
+ - name: config
+ mountPath: /etc/config.yaml
+ subPath: config.yaml
volumes:
- - name: config
- configMap:
- name: otel-collector-config
-
+ - name: config
+ configMap:
+ name: otel-collector-config
---
apiVersion: v1
kind: Service
@@ -185,10 +183,115 @@ spec:
selector:
app: otel-collector
ports:
- - name: otlp-grpc
- port: 4317
- targetPort: 4317
- - name: otlp-http
- port: 4318
- targetPort: 4318
-```
\ No newline at end of file
+ - name: otlp-grpc
+ port: 4317
+ - name: otlp-http
+ port: 4318
+```
+
+
+
+---
+
+## Option 2: Direct from Python
+
+Export OTLP traces directly from your Python application without a collector. Use the `ZeroEvalOTLPProvider` included in the SDK, or configure a standard `OTLPSpanExporter`.
+
+### Using ZeroEvalOTLPProvider
+
+```python
+from opentelemetry import trace
+from zeroeval.providers import ZeroEvalOTLPProvider
+
+provider = ZeroEvalOTLPProvider(
+ api_key="YOUR_ZEROEVAL_API_KEY",
+ service_name="my-service"
+)
+trace.set_tracer_provider(provider)
+
+tracer = trace.get_tracer("my-service")
+
+with tracer.start_as_current_span("process_request") as span:
+ span.set_attribute("user.id", "12345")
+ result = do_work()
+ span.set_attribute("result.status", "ok")
+```
+
+### Using standard OTLPSpanExporter
+
+```python
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
+
+exporter = OTLPSpanExporter(
+ endpoint="https://api.zeroeval.com/v1/traces",
+ headers={"Authorization": "Bearer YOUR_ZEROEVAL_API_KEY"}
+)
+
+provider = TracerProvider()
+provider.add_span_processor(BatchSpanProcessor(exporter))
+trace.set_tracer_provider(provider)
+```
+
+---
+
+## Option 3: Direct from Node.js
+
+```typescript
+import { NodeTracerProvider } from "@opentelemetry/sdk-node";
+import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
+import { BatchSpanProcessor } from "@opentelemetry/sdk-trace-base";
+
+const exporter = new OTLPTraceExporter({
+ url: "https://api.zeroeval.com/v1/traces",
+ headers: {
+ Authorization: "Bearer YOUR_ZEROEVAL_API_KEY",
+ },
+});
+
+const provider = new NodeTracerProvider();
+provider.addSpanProcessor(new BatchSpanProcessor(exporter));
+provider.register();
+```
+
+---
+
+## Attribute Mapping
+
+ZeroEval maps standard OpenTelemetry span attributes to its internal format:
+
+| OTLP Attribute | ZeroEval Field | Notes |
+| ----------------------- | ---------------------- | ------------------------------------------ |
+| `span.name` | `name` | Span name |
+| `span.kind` | `kind` | Mapped to `generic`, `llm`, etc. |
+| `span.status` | `status` | `ok`, `error`, `unset` |
+| `span.start_time` | `started_at` | Nanosecond timestamp converted to ISO 8601 |
+| `span.end_time` | `ended_at` | Nanosecond timestamp converted to ISO 8601 |
+| `span.trace_id` | `trace_id` | Hex-encoded trace ID |
+| `span.parent_span_id` | `parent_span_id` | Hex-encoded span ID |
+| `span.attributes.*` | `attributes` | All attributes preserved |
+| `resource.service.name` | Service identification | Used for grouping |
+
+### LLM Spans
+
+To get LLM-specific features (cost calculation, token tracking), set these attributes on your spans:
+
+| Attribute | Description |
+| ------------------------------------------------------- | ------------------------------------------- |
+| `llm.provider` or `gen_ai.system` | Provider name (`openai`, `anthropic`, etc.) |
+| `llm.model` or `gen_ai.request.model` | Model identifier |
+| `llm.input_tokens` or `gen_ai.usage.prompt_tokens` | Input token count |
+| `llm.output_tokens` or `gen_ai.usage.completion_tokens` | Output token count |
+
+### Sessions
+
+Attach session context to your OTLP spans via attributes:
+
+| Attribute | Description |
+| ----------------------- | ------------------------------ |
+| `zeroeval.session.id` | Session ID for grouping traces |
+| `zeroeval.session.name` | Human-readable session name |
+
+The `ZeroEvalOTLPProvider` stamps these automatically when you configure a session via environment variables (`ZEROEVAL_SESSION_ID`, `ZEROEVAL_SESSION_NAME`).
diff --git a/tracing/quickstart.mdx b/tracing/quickstart.mdx
deleted file mode 100644
index 2814d10..0000000
--- a/tracing/quickstart.mdx
+++ /dev/null
@@ -1,25 +0,0 @@
----
-title: Quickstart
-description: Get started with tracing and observability in ZeroEval
----
-
-### Get your API key
-
-Create an API key from your [Settings → API Keys](https://app.zeroeval.com/settings?section=api-keys) page.
-
-### Install the SDK
-
-Get started with one of our SDKs:
-
-
-
- For Python applications using frameworks like FastAPI, Django, or Flask
-
-
- For TypeScript and JavaScript applications using Node.js or Bun
-
-
-
-
-Using Cursor, Claude Code, or another coding agent? The [`zeroeval-install` skill](/integrations/skills) can handle SDK setup, first trace, and prompt migration for you.
-
diff --git a/tracing/reference.mdx b/tracing/reference.mdx
deleted file mode 100644
index 1fde022..0000000
--- a/tracing/reference.mdx
+++ /dev/null
@@ -1,138 +0,0 @@
----
-title: Reference
-description: Environment variables and configuration parameters for the ZeroEval tracer
----
-
-Configure the ZeroEval tracer through environment variables, initialization parameters, or runtime methods.
-
-## Environment Variables
-
-Set before importing ZeroEval to configure default behavior.
-
-| Variable | Type | Default | Description |
-| -------------------------------- | ------- | ---------------------------- | --------------------------------------- |
-| `ZEROEVAL_API_KEY` | string | `""` | API key for authentication |
-| `ZEROEVAL_API_URL` | string | `"https://api.zeroeval.com"` | API endpoint URL |
-| `ZEROEVAL_WORKSPACE_NAME` | string | `"Personal Workspace"` | Workspace name |
-| `ZEROEVAL_SESSION_ID` | string | auto-generated | Session ID for grouping traces |
-| `ZEROEVAL_SESSION_NAME` | string | `""` | Human-readable session name |
-| `ZEROEVAL_SAMPLING_RATE` | float | `"1.0"` | Sampling rate (0.0-1.0) |
-| `ZEROEVAL_DISABLED_INTEGRATIONS` | string | `""` | Comma-separated integrations to disable |
-| `ZEROEVAL_DEBUG` | boolean | `"false"` | Enable debug logging |
-
-**Activation:** Set environment variables before importing the SDK.
-
-```bash
-export ZEROEVAL_API_KEY="ze_1234567890abcdef"
-export ZEROEVAL_SAMPLING_RATE="0.1"
-export ZEROEVAL_DEBUG="true"
-```
-
-## Initialization Parameters
-
-Configure via `ze.init()` - overrides environment variables.
-
-| Parameter | Type | Default | Description |
-| ----------------------- | -------------- | ---------------------------- | -------------------------------- |
-| `api_key` | string | `None` | API key for authentication |
-| `workspace_name` | string | `"Personal Workspace"` | Workspace name |
-| `debug` | boolean | `False` | Enable debug logging with colors |
-| `api_url` | string | `"https://api.zeroeval.com"` | API endpoint URL |
-| `disabled_integrations` | list[str] | `None` | Integrations to disable |
-| `enabled_integrations` | list[str] | `None` | Only enable these integrations |
-| `setup_otlp` | boolean | `True` | Setup OpenTelemetry OTLP export |
-| `service_name` | string | `"zeroeval-app"` | OTLP service name |
-| `tags` | dict[str, str] | `None` | Global tags for all spans |
-| `sampling_rate` | float | `None` | Sampling rate (0.0-1.0) |
-
-**Activation:** Pass parameters to `ze.init()`.
-
-```python
-ze.init(
- api_key="ze_1234567890abcdef",
- sampling_rate=0.1,
- disabled_integrations=["langchain"],
- debug=True
-)
-```
-
-## Runtime Configuration
-
-Configure after initialization via `ze.tracer.configure()`.
-
-| Parameter | Type | Default | Description |
-| ---------------------- | --------------- | ------- | ------------------------------------ |
-| `flush_interval` | float | `1.0` | Flush frequency in seconds |
-| `max_spans` | int | `20` | Buffer size before forced flush |
-| `collect_code_details` | boolean | `True` | Capture code details in spans |
-| `integrations` | dict[str, bool] | `{}` | Enable/disable specific integrations |
-| `sampling_rate` | float | `None` | Sampling rate (0.0-1.0) |
-
-**Activation:** Call `ze.tracer.configure()` anytime after initialization.
-
-```python
-ze.tracer.configure(
- flush_interval=0.5,
- max_spans=100,
- sampling_rate=0.05,
- integrations={"openai": True, "langchain": False}
-)
-```
-
-## Available Integrations
-
-| Integration | User-Friendly Name | Auto-Instruments |
-| ---------------------- | ------------------ | -------------------- |
-| `OpenAIIntegration` | `"openai"` | OpenAI client calls |
-| `GeminiIntegration` | `"gemini"` | Google Gemini calls |
-| `LangChainIntegration` | `"langchain"` | LangChain components |
-| `LangGraphIntegration` | `"langgraph"` | LangGraph workflows |
-| `HttpxIntegration` | `"httpx"` | HTTPX requests |
-| `VocodeIntegration` | `"vocode"` | Vocode voice SDK |
-
-**Control via:**
-
-- Environment: `ZEROEVAL_DISABLED_INTEGRATIONS="langchain,langgraph"`
-- Init: `disabled_integrations=["langchain"]` or `enabled_integrations=["openai"]`
-- Runtime: `ze.tracer.configure(integrations={"langchain": False})`
-
-## Configuration Examples
-
-### Production Setup
-
-```python
-# High-volume production with sampling
-ze.init(
- api_key="your_key",
- sampling_rate=0.05, # 5% sampling
- debug=False,
- disabled_integrations=["langchain"]
-)
-
-ze.tracer.configure(
- flush_interval=0.5, # Faster flushes
- max_spans=100 # Larger buffer
-)
-```
-
-### Development Setup
-
-```python
-# Full tracing with debug info
-ze.init(
- api_key="your_key",
- debug=True, # Colored logs
- sampling_rate=1.0 # Capture everything
-)
-```
-
-### Memory-Optimized Setup
-
-```python
-# Minimize memory usage
-ze.tracer.configure(
- max_spans=5, # Small buffer
- collect_code_details=False, # No code capture
- flush_interval=2.0 # Less frequent flushes
-)
-```
diff --git a/tracing/sdks/python/reference.mdx b/tracing/sdks/python/reference.mdx
index 0bcb821..a945b59 100644
--- a/tracing/sdks/python/reference.mdx
+++ b/tracing/sdks/python/reference.mdx
@@ -1,6 +1,6 @@
---
-title: 'Reference'
-description: 'Complete API reference for the Python SDK'
+title: "Reference"
+description: "Complete API reference for the Python SDK"
---
## Installation
@@ -17,26 +17,43 @@ Initializes the ZeroEval SDK. Must be called before using any other SDK features
```python
def init(
- api_key: str = None,
- workspace_name: str = "Personal Workspace",
+ api_key: str = None,
+ workspace_name: str = "Personal Organization",
+ organization_name: str = None,
debug: bool = False,
- api_url: str = "https://api.zeroeval.com"
+ api_url: str = None,
+ disabled_integrations: list[str] = None,
+ enabled_integrations: list[str] = None,
+ setup_otlp: bool = True,
+ service_name: str = "zeroeval-app",
+ tags: dict[str, str] = None,
+ sampling_rate: float = None
) -> None
```
-**Parameters:**
-- `api_key` (str, optional): Your ZeroEval API key. If not provided, uses `ZEROEVAL_API_KEY` environment variable
-- `workspace_name` (str, optional): The name of your workspace. Defaults to `"Personal Workspace"`
-- `debug` (bool, optional): If True, enables detailed logging for debugging. Can also be enabled by setting `ZEROEVAL_DEBUG=true` environment variable
-- `api_url` (str, optional): The URL of the ZeroEval API. Defaults to `"https://api.zeroeval.com"`
+| Parameter | Type | Default | Description |
+| ----------------------- | ---------------- | ---------------------------- | ------------------------------------------------- |
+| `api_key` | `str` | `None` | API key. Falls back to `ZEROEVAL_API_KEY` env var |
+| `workspace_name` | `str` | `"Personal Organization"` | Deprecated -- use `organization_name` |
+| `organization_name` | `str` | `None` | Organization name |
+| `debug` | `bool` | `False` | Enable debug logging with colors |
+| `api_url` | `str` | `"https://api.zeroeval.com"` | API endpoint URL |
+| `disabled_integrations` | `list[str]` | `None` | Integrations to disable (e.g. `["langchain"]`) |
+| `enabled_integrations` | `list[str]` | `None` | Only enable these integrations |
+| `setup_otlp` | `bool` | `True` | Configure OpenTelemetry OTLP export |
+| `service_name` | `str` | `"zeroeval-app"` | OTLP service name |
+| `tags` | `dict[str, str]` | `None` | Global tags applied to all spans |
+| `sampling_rate` | `float` | `None` | Sampling rate 0.0-1.0 (1.0 = sample all) |
**Example:**
+
```python
import zeroeval as ze
ze.init(
api_key="your-api-key",
- workspace_name="My Workspace",
+ sampling_rate=0.1,
+ disabled_integrations=["langchain"],
debug=True
)
```
@@ -60,6 +77,7 @@ Decorator and context manager for creating spans around code blocks.
```
**Parameters:**
+
- `name` (str): Name of the span
- `session_id` (str, optional): **Deprecated** - Use `session` parameter instead
- `session` (Union[str, dict], optional): Session information. Can be:
@@ -71,6 +89,7 @@ Decorator and context manager for creating spans around code blocks.
- `tags` (dict, optional): Tags to attach to the span
**Usage as Decorator:**
+
```python
import zeroeval as ze
@@ -91,6 +110,7 @@ def user_action():
```
**Usage as Context Manager:**
+
```python
import zeroeval as ze
@@ -111,10 +131,12 @@ Decorator that attaches dataset and model information to a function.
```
**Parameters:**
+
- `dataset` (Dataset, optional): Dataset to use for the experiment
- `model` (str, optional): Model identifier
**Example:**
+
```python
import zeroeval as ze
@@ -143,11 +165,13 @@ Dataset(
```
**Parameters:**
+
- `name` (str): The name of the dataset
- `data` (list[dict]): A list of dictionaries containing the data
- `description` (str, optional): A description of the dataset
**Example:**
+
```python
dataset = Dataset(
name="Capitals",
@@ -170,6 +194,7 @@ def push(self, create_new_version: bool = False) -> Dataset
```
**Parameters:**
+
- `self`: The Dataset instance
- `create_new_version` (bool, optional): For backward compatibility. This parameter is no longer needed as new versions are automatically created when a dataset name already exists. Defaults to False
@@ -189,6 +214,7 @@ def pull(
```
**Parameters:**
+
- `cls`: The Dataset class itself (automatically provided when using `@classmethod`)
- `dataset_name` (str): The name of the dataset to pull from the backend
- `version_number` (int, optional): Specific version number to pull. If not provided, pulls the latest version
@@ -204,6 +230,7 @@ def add_rows(self, new_rows: list[dict[str, Any]]) -> None
```
**Parameters:**
+
- `self`: The Dataset instance
- `new_rows` (list[dict]): A list of dictionaries representing the rows to add
@@ -221,6 +248,7 @@ def add_image(
```
**Parameters:**
+
- `self`: The Dataset instance
- `row_index` (int): Index of the row to update (0-based)
- `column_name` (str): Name of the column to add the image to
@@ -240,6 +268,7 @@ def add_audio(
```
**Parameters:**
+
- `self`: The Dataset instance
- `row_index` (int): Index of the row to update (0-based)
- `column_name` (str): Name of the column to add the audio to
@@ -260,6 +289,7 @@ def add_media_url(
```
**Parameters:**
+
- `self`: The Dataset instance
- `row_index` (int): Index of the row to update (0-based)
- `column_name` (str): Name of the column to add the media URL to
@@ -323,6 +353,7 @@ Experiment(
```
**Parameters:**
+
- `dataset` (Dataset): The dataset to run the experiment on
- `task` (Callable): Function that processes each row and returns output
- `evaluators` (list[Callable], optional): List of evaluator functions that take (row, output) and return evaluation result
@@ -330,6 +361,7 @@ Experiment(
- `description` (str, optional): Description of the experiment. Defaults to task function's docstring
**Example:**
+
```python
import zeroeval as ze
@@ -375,6 +407,7 @@ def run(
```
**Parameters:**
+
- `self`: The Experiment instance
- `subset` (list[dict], optional): Subset of dataset rows to run the experiment on. If None, runs on entire dataset
@@ -393,6 +426,7 @@ def run_task(
```
**Parameters:**
+
- `self`: The Experiment instance
- `subset` (list[dict], optional): Subset of dataset rows to run the task on. If None, runs on entire dataset
- `raise_on_error` (bool, optional): If True, raises exceptions encountered during task execution. If False, captures errors. Defaults to False
@@ -412,6 +446,7 @@ def run_evaluators(
```
**Parameters:**
+
- `self`: The Experiment instance
- `evaluators` (list[Callable], optional): List of evaluator functions to run. If None, uses evaluators from the Experiment instance
- `results` (list[ExperimentResult], optional): List of results to evaluate. If None, uses results from the Experiment instance
@@ -437,6 +472,7 @@ def set_io(
```
**Parameters:**
+
- `self`: The Span instance
- `input_data` (str, optional): Input data to attach to the span. Will be converted to string if not already
- `output_data` (str, optional): Output data to attach to the span. Will be converted to string if not already
@@ -450,6 +486,7 @@ def set_tags(self, tags: dict[str, str]) -> None
```
**Parameters:**
+
- `self`: The Span instance
- `tags` (dict[str, str]): Dictionary of tags to set on the span
@@ -462,6 +499,7 @@ def set_attributes(self, attributes: dict[str, Any]) -> None
```
**Parameters:**
+
- `self`: The Span instance
- `attributes` (dict[str, Any]): Dictionary of attributes to set on the span
@@ -479,6 +517,7 @@ def set_error(
```
**Parameters:**
+
- `self`: The Span instance
- `code` (str): Error code or exception class name
- `message` (str): Error message
@@ -500,6 +539,7 @@ def add_screenshot(
```
**Parameters:**
+
- `self`: The Span instance
- `base64_data` (str): Base64 encoded image data. Accepts raw base64 or data URL format (`data:image/png;base64,...`)
- `viewport` (str, optional): Viewport type - `"desktop"`, `"mobile"`, or `"tablet"`. Defaults to `"desktop"`
@@ -508,6 +548,7 @@ def add_screenshot(
- `label` (str, optional): Human-readable description of the screenshot
**Example:**
+
```python
import zeroeval as ze
@@ -520,7 +561,7 @@ with ze.span(name="browser_test", tags={"test": "visual"}) as span:
height=1080,
label="Homepage - Desktop"
)
-
+
# Also capture mobile view
span.add_screenshot(
base64_data=mobile_screenshot_base64,
@@ -529,7 +570,7 @@ with ze.span(name="browser_test", tags={"test": "visual"}) as span:
height=812,
label="Homepage - iPhone"
)
-
+
span.set_io(
input_data="Navigate to homepage",
output_data="Captured viewport screenshots"
@@ -550,25 +591,27 @@ def add_image(
```
**Parameters:**
+
- `self`: The Span instance
- `base64_data` (str): Base64 encoded image data. Accepts raw base64 or data URL format
- `label` (str, optional): Human-readable description of the image
- `metadata` (dict, optional): Additional metadata to store with the image
**Example:**
+
```python
import zeroeval as ze
with ze.span(name="chart_generation") as span:
# Generate a chart and attach it
chart_base64 = generate_chart(data)
-
+
span.add_image(
base64_data=chart_base64,
label="Monthly Revenue Chart",
metadata={"chart_type": "bar", "data_points": 12}
)
-
+
span.set_io(
input_data="Generate revenue chart for Q4",
output_data="Chart generated with 12 data points"
@@ -580,6 +623,7 @@ with ze.span(name="chart_generation") as span:
If your images are already hosted externally, you can pass an HTTPS URL instead of base64 data. ZeroEval will download, validate, and copy the image into its own storage during ingestion.
Supported URL sources:
+
- **S3 presigned URLs** (`*.amazonaws.com` with valid authentication parameters)
- **CDN URLs** from trusted domains
@@ -634,7 +678,9 @@ with ze.span(name="product_image_check") as span:
```
-Images attached to spans can be evaluated by LLM judges configured for multimodal evaluation. See the [Multimodal Evaluation](/judges/multimodal-evaluation) guide for setup instructions.
+ Images attached to spans can be evaluated by LLM judges configured for
+ multimodal evaluation. See the [Multimodal
+ Evaluation](/judges/multimodal-evaluation) guide for setup instructions.
## Context Functions
@@ -681,12 +727,14 @@ def set_tag(
```
**Parameters:**
+
- `target`: The target to set tags on
- `Span`: Sets tags on the specific span
- `str`: Sets tags on the trace (if valid trace ID) or session (if valid session ID)
- `tags` (dict[str, str]): Dictionary of tags to set
**Example:**
+
```python
import zeroeval as ze
@@ -701,44 +749,6 @@ if trace_id:
ze.set_tag(trace_id, {"version": "1.5"})
```
-### `set_signal()`
-
-Send a signal to a span, trace, or session.
-
-```python
-def set_signal(
- target: Union[Span, str],
- signals: dict[str, Union[str, bool, int, float]]
-) -> bool
-```
-
-**Parameters:**
-- `target`: The entity to attach signals to
- - `Span`: Sends signals to the specific span
- - `str`: Sends signals to the trace (if active trace ID) or session
-- `signals` (dict): Dictionary of signal names to values
-
-**Returns:** True if signals were sent successfully, False otherwise
-
-**Example:**
-```python
-import zeroeval as ze
-
-# Send signals to current span
-current_span = ze.get_current_span()
-if current_span:
- ze.set_signal(current_span, {
- "accuracy": 0.95,
- "is_successful": True,
- "error_count": 0
- })
-
-# Send signals to trace
-trace_id = ze.get_current_trace()
-if trace_id:
- ze.set_signal(trace_id, {"model_score": 0.85})
-```
-
## Judge Feedback APIs
### `send_feedback()`
@@ -762,6 +772,7 @@ def send_feedback(
```
**Notes:**
+
- Existing usage without `criteria_feedback` is unchanged.
- `criteria_feedback` is optional and supported for scored judges.
- `judge_id` is required when sending `expected_score`, `score_direction`, or `criteria_feedback`.
@@ -778,6 +789,7 @@ def get_judge_criteria(
```
**Returns:**
+
- `judge_id`
- `evaluation_type`
- `score_min`, `score_max`, `pass_threshold`
@@ -805,9 +817,97 @@ zeroeval setup
## Environment Variables
-The SDK uses the following environment variables:
+Set before importing ZeroEval to configure default behavior.
+
+| Variable | Type | Default | Description |
+| -------------------------------- | ------- | ---------------------------- | --------------------------------------- |
+| `ZEROEVAL_API_KEY` | string | `""` | API key for authentication |
+| `ZEROEVAL_API_URL` | string | `"https://api.zeroeval.com"` | API endpoint URL |
+| `ZEROEVAL_WORKSPACE_NAME` | string | `"Personal Workspace"` | Workspace name |
+| `ZEROEVAL_SESSION_ID` | string | auto-generated | Session ID for grouping traces |
+| `ZEROEVAL_SESSION_NAME` | string | `""` | Human-readable session name |
+| `ZEROEVAL_SAMPLING_RATE` | float | `"1.0"` | Sampling rate (0.0-1.0) |
+| `ZEROEVAL_DISABLED_INTEGRATIONS` | string | `""` | Comma-separated integrations to disable |
+| `ZEROEVAL_DEBUG` | boolean | `"false"` | Enable debug logging |
+
+```bash
+export ZEROEVAL_API_KEY="ze_1234567890abcdef"
+export ZEROEVAL_SAMPLING_RATE="0.1"
+export ZEROEVAL_DEBUG="true"
+```
+
+## Runtime Configuration
+
+Configure after initialization via `ze.tracer.configure()`.
+
+| Parameter | Type | Default | Description |
+| ---------------------- | ----------------- | ------- | ------------------------------------ |
+| `flush_interval` | `float` | `1.0` | Flush frequency in seconds |
+| `max_spans` | `int` | `20` | Buffer size before forced flush |
+| `collect_code_details` | `bool` | `True` | Capture code details in spans |
+| `integrations` | `dict[str, bool]` | `{}` | Enable/disable specific integrations |
+| `sampling_rate` | `float` | `None` | Sampling rate (0.0-1.0) |
+
+```python
+ze.tracer.configure(
+ flush_interval=0.5,
+ max_spans=100,
+ sampling_rate=0.05,
+ integrations={"openai": True, "langchain": False}
+)
+```
+
+## Available Integrations
-- `ZEROEVAL_API_KEY`: Your ZeroEval API key
-- `ZEROEVAL_API_URL`: API endpoint URL (defaults to `https://api.zeroeval.com`)
-- `ZEROEVAL_DEBUG`: Set to `true` to enable debug logging
-- `ZEROEVAL_DISABLED_INTEGRATIONS`: Comma-separated list of integrations to disable
\ No newline at end of file
+| Integration | Name | Auto-Instruments |
+| ---------------------- | ------------- | -------------------- |
+| `OpenAIIntegration` | `"openai"` | OpenAI client calls |
+| `GeminiIntegration` | `"gemini"` | Google Gemini calls |
+| `LangChainIntegration` | `"langchain"` | LangChain components |
+| `LangGraphIntegration` | `"langgraph"` | LangGraph workflows |
+| `HttpxIntegration` | `"httpx"` | HTTPX requests |
+| `VocodeIntegration` | `"vocode"` | Vocode voice SDK |
+
+Control integrations via:
+
+- **Environment:** `ZEROEVAL_DISABLED_INTEGRATIONS="langchain,langgraph"`
+- **Init:** `disabled_integrations=["langchain"]` or `enabled_integrations=["openai"]`
+- **Runtime:** `ze.tracer.configure(integrations={"langchain": False})`
+
+## Configuration Examples
+
+### Production
+
+```python
+ze.init(
+ api_key="your_key",
+ sampling_rate=0.05,
+ debug=False,
+ disabled_integrations=["langchain"]
+)
+
+ze.tracer.configure(
+ flush_interval=0.5,
+ max_spans=100
+)
+```
+
+### Development
+
+```python
+ze.init(
+ api_key="your_key",
+ debug=True,
+ sampling_rate=1.0
+)
+```
+
+### Memory-Optimized
+
+```python
+ze.tracer.configure(
+ max_spans=5,
+ collect_code_details=False,
+ flush_interval=2.0
+)
+```
diff --git a/tracing/sdks/python/setup.mdx b/tracing/sdks/python/setup.mdx
index 84c6a91..c11f0e6 100644
--- a/tracing/sdks/python/setup.mdx
+++ b/tracing/sdks/python/setup.mdx
@@ -113,6 +113,146 @@ trace_id = ze.get_current_trace()
session_id = ze.get_current_session()
```
+## Sessions
+
+Sessions group related spans together, making it easier to track complex workflows, user interactions, or multi-step processes.
+
+### Basic Session
+
+Provide a session ID to associate spans with a session:
+
+```python
+import uuid
+import zeroeval as ze
+
+session_id = str(uuid.uuid4())
+
+@ze.span(name="process_request", session=session_id)
+def process_request(data):
+ return transform_data(data)
+```
+
+### Named Sessions
+
+For better organization in the dashboard, provide both an ID and a name:
+
+```python
+@ze.span(
+ name="user_interaction",
+ session={
+ "id": session_id,
+ "name": "Customer Support Chat - User #12345"
+ }
+)
+def handle_support_chat(user_id, message):
+ return generate_response(message)
+```
+
+### Session Inheritance
+
+Child spans automatically inherit the session from their parent:
+
+```python
+session_info = {
+ "id": str(uuid.uuid4()),
+ "name": "Order Processing Pipeline"
+}
+
+@ze.span(name="process_order", session=session_info)
+def process_order(order_id):
+ validate_order(order_id)
+ charge_payment(order_id)
+ fulfill_order(order_id)
+
+@ze.span(name="validate_order")
+def validate_order(order_id):
+ return check_inventory(order_id)
+
+@ze.span(name="charge_payment")
+def charge_payment(order_id):
+ return process_payment(order_id)
+```
+
+### Context Manager Sessions
+
+```python
+session_info = {
+ "id": str(uuid.uuid4()),
+ "name": "Data Pipeline Run"
+}
+
+with ze.span(name="etl_pipeline", session=session_info) as pipeline_span:
+ with ze.span(name="extract_data") as extract_span:
+ raw_data = fetch_from_source()
+ extract_span.set_io(output_data=f"Extracted {len(raw_data)} records")
+
+ with ze.span(name="transform_data") as transform_span:
+ clean_data = transform_records(raw_data)
+ transform_span.set_io(
+ input_data=f"{len(raw_data)} raw records",
+ output_data=f"{len(clean_data)} clean records"
+ )
+```
+
+## Tags
+
+Tags are key-value pairs attached to spans, traces, or sessions. They power the facet filters in the console so you can slice your telemetry by user, plan, model, tenant, or anything else.
+
+### Tag Once, Inherit Everywhere
+
+Tags on the first span automatically flow down to all child spans:
+
+```python
+@ze.span(
+ name="handle_request",
+ tags={
+ "user_id": "42",
+ "tenant": "acme-corp",
+ "plan": "enterprise"
+ }
+)
+def handle_request():
+ with ze.span(name="fetch_data"):
+ ...
+
+ with ze.span(name="process", tags={"stage": "post"}):
+ ...
+```
+
+### Tag a Single Span
+
+Tags provided on a specific span stay only on that span -- they are not copied to siblings or parents:
+
+```python
+@ze.span(name="top_level")
+def top_level():
+ with ze.span(name="db_call", tags={"table": "customers", "operation": "SELECT"}):
+ query_database()
+
+ with ze.span(name="render"):
+ render_template()
+```
+
+### Granular Tagging
+
+Add tags at the span, trace, or session level after creation:
+
+```python
+with ze.span(name="root_invoke", session=session_info, tags={"run": "invoke"}):
+ current_span = ze.get_current_span()
+ ze.set_tag(current_span, {"phase": "pre-run"})
+
+ current_trace = ze.get_current_trace()
+ ze.set_tag(current_trace, {"run_mode": "invoke"})
+
+ current_session = ze.get_current_session()
+ ze.set_tag(current_session, {"env": "local"})
+```
+
+## Feedback
+
+To attach human or programmatic feedback to completions, see [Human Feedback](/feedback/human-feedback) and the [Feedback SDK docs](/feedback/python). For automated quality evaluations, see [Judges](/judges/introduction).
+
## CLI Tooling
The Python SDK includes helpful CLI commands:
diff --git a/tracing/sdks/typescript/reference.mdx b/tracing/sdks/typescript/reference.mdx
index 6c9d727..ec85d01 100644
--- a/tracing/sdks/typescript/reference.mdx
+++ b/tracing/sdks/typescript/reference.mdx
@@ -1,6 +1,6 @@
---
-title: 'Reference'
-description: 'Complete API reference for the TypeScript SDK'
+title: "Reference"
+description: "Complete API reference for the TypeScript SDK"
---
## Installation
@@ -16,29 +16,30 @@ npm install zeroeval
Initializes the ZeroEval SDK. Must be called before using any other SDK features.
```typescript
-function init(opts?: InitOptions): void
+function init(opts?: InitOptions): void;
```
#### Parameters
-| Option | Type | Default | Description |
-| --- | --- | --- | --- |
-| `apiKey` | `string` | `ZEROEVAL_API_KEY` env | Your ZeroEval API key |
-| `apiUrl` | `string` | `https://api.zeroeval.com` | Custom API URL |
-| `flushInterval` | `number` | `10` | Interval in seconds to flush spans |
-| `maxSpans` | `number` | `100` | Maximum spans to buffer before flushing |
-| `collectCodeDetails` | `boolean` | `true` | Capture source code context |
-| `integrations` | `Record` | — | Enable/disable specific integrations |
-| `debug` | `boolean` | `false` | Enable debug logging |
+| Option | Type | Default | Description |
+| -------------------- | ------------------------- | -------------------------- | --------------------------------------- |
+| `apiKey` | `string` | `ZEROEVAL_API_KEY` env | Your ZeroEval API key |
+| `apiUrl` | `string` | `https://api.zeroeval.com` | Custom API URL |
+| `workspaceName` | `string` | `"Personal Organization"` | Workspace/organization name |
+| `flushInterval` | `number` | `10` | Interval in seconds to flush spans |
+| `maxSpans` | `number` | `100` | Maximum spans to buffer before flushing |
+| `collectCodeDetails` | `boolean` | `true` | Capture source code context |
+| `integrations` | `Record` | — | Enable/disable specific integrations |
+| `debug` | `boolean` | `false` | Enable debug logging |
#### Example
```typescript
-import * as ze from 'zeroeval';
+import * as ze from "zeroeval";
ze.init({
- apiKey: 'your-api-key',
- debug: true
+ apiKey: "your-api-key",
+ debug: true,
});
```
@@ -51,7 +52,7 @@ ze.init({
Wraps a supported AI client to automatically trace all API calls.
```typescript
-function wrap(client: T): WrappedClient
+function wrap(client: T): WrappedClient;
```
#### Supported Clients
@@ -63,14 +64,14 @@ function wrap(client: T): WrappedClient
```typescript
// OpenAI
-import { OpenAI } from 'openai';
-import * as ze from 'zeroeval';
+import { OpenAI } from "openai";
+import * as ze from "zeroeval";
const openai = ze.wrap(new OpenAI());
// Vercel AI SDK
-import * as ai from 'ai';
-import * as ze from 'zeroeval';
+import * as ai from "ai";
+import * as ze from "zeroeval";
const wrappedAI = ze.wrap(ai);
```
@@ -86,34 +87,31 @@ Wraps a function execution in a span, automatically capturing timing and errors.
```typescript
function withSpan(
opts: SpanOptions,
- fn: () => Promise | T
-): Promise | T
+ fn: () => Promise | T,
+): Promise | T;
```
#### SpanOptions
-| Option | Type | Required | Description |
-| --- | --- | --- | --- |
-| `name` | `string` | Yes | Name of the span |
-| `sessionId` | `string` | No | Session ID to associate with the span |
-| `sessionName` | `string` | No | Human-readable session name |
-| `tags` | `Record` | No | Tags to attach to the span |
-| `attributes` | `Record` | No | Additional attributes |
-| `inputData` | `any` | No | Manual input data override |
-| `outputData` | `any` | No | Manual output data override |
+| Option | Type | Required | Description |
+| ------------- | ------------------------- | -------- | ------------------------------------- |
+| `name` | `string` | Yes | Name of the span |
+| `sessionId` | `string` | No | Session ID to associate with the span |
+| `sessionName` | `string` | No | Human-readable session name |
+| `tags` | `Record` | No | Tags to attach to the span |
+| `attributes` | `Record` | No | Additional attributes |
+| `inputData` | `any` | No | Manual input data override |
+| `outputData` | `any` | No | Manual output data override |
#### Example
```typescript
-import * as ze from 'zeroeval';
+import * as ze from "zeroeval";
-const result = await ze.withSpan(
- { name: 'fetch-user-data' },
- async () => {
- const user = await fetchUser(userId);
- return user;
- }
-);
+const result = await ze.withSpan({ name: "fetch-user-data" }, async () => {
+ const user = await fetchUser(userId);
+ return user;
+});
```
### `@span` Decorator
@@ -127,19 +125,17 @@ span(opts: SpanOptions): MethodDecorator
#### Example
```typescript
-import * as ze from 'zeroeval';
+import * as ze from "zeroeval";
class UserService {
- @ze.span({ name: 'get-user' })
+ @ze.span({ name: "get-user" })
async getUser(id: string): Promise {
return await db.users.findById(id);
}
}
```
-
-Requires `experimentalDecorators: true` in your `tsconfig.json`.
-
+Requires `experimentalDecorators: true` in your `tsconfig.json`.
---
@@ -150,7 +146,7 @@ Requires `experimentalDecorators: true` in your `tsconfig.json`.
Returns the currently active span, if any.
```typescript
-function getCurrentSpan(): Span | undefined
+function getCurrentSpan(): Span | undefined;
```
### `getCurrentTrace()`
@@ -158,7 +154,7 @@ function getCurrentSpan(): Span | undefined
Returns the current trace ID.
```typescript
-function getCurrentTrace(): string | undefined
+function getCurrentTrace(): string | undefined;
```
### `getCurrentSession()`
@@ -166,7 +162,7 @@ function getCurrentTrace(): string | undefined
Returns the current session ID.
```typescript
-function getCurrentSession(): string | undefined
+function getCurrentSession(): string | undefined;
```
### `setTag()`
@@ -176,17 +172,17 @@ Sets tags on a span, trace, or session.
```typescript
function setTag(
target: Span | string | undefined,
- tags: Record
-): void
+ tags: Record,
+): void;
```
#### Parameters
-| Parameter | Description |
-| --- | --- |
-| `Span` | Sets tags on the specific span |
-| `string` | Sets tags on the trace or session by ID |
-| `undefined` | Sets tags on the current span |
+| Parameter | Description |
+| ----------- | --------------------------------------- |
+| `Span` | Sets tags on the specific span |
+| `string` | Sets tags on the trace or session by ID |
+| `undefined` | Sets tags on the current span |
---
@@ -197,17 +193,17 @@ function setTag(
Creates or fetches versioned prompts from the Prompt Library. Returns decorated content for downstream LLM calls.
```typescript
-async function prompt(options: PromptOptions): Promise
+async function prompt(options: PromptOptions): Promise;
```
#### PromptOptions
-| Option | Type | Required | Description |
-| --- | --- | --- | --- |
-| `name` | `string` | Yes | Task name associated with the prompt |
-| `content` | `string` | No | Raw prompt content (used as fallback or for explicit mode) |
-| `variables` | `Record` | No | Template variables to interpolate `{{variable}}` tokens |
-| `from` | `string` | No | Version control: `"latest"`, `"explicit"`, or a 64-char SHA-256 hash |
+| Option | Type | Required | Description |
+| ----------- | ------------------------ | -------- | -------------------------------------------------------------------- |
+| `name` | `string` | Yes | Task name associated with the prompt |
+| `content` | `string` | No | Raw prompt content (used as fallback or for explicit mode) |
+| `variables` | `Record` | No | Template variables to interpolate `{{variable}}` tokens |
+| `from` | `string` | No | Version control: `"latest"`, `"explicit"`, or a 64-char SHA-256 hash |
#### Behavior
@@ -219,32 +215,32 @@ async function prompt(options: PromptOptions): Promise
#### Examples
```typescript
-import * as ze from 'zeroeval';
+import * as ze from "zeroeval";
// Auto-optimization mode (recommended)
const prompt = await ze.prompt({
- name: 'customer-support',
- content: 'You are a helpful {{role}} assistant.',
- variables: { role: 'customer service' }
+ name: "customer-support",
+ content: "You are a helpful {{role}} assistant.",
+ variables: { role: "customer service" },
});
// Explicit mode - bypass auto-optimization
const prompt = await ze.prompt({
- name: 'customer-support',
- content: 'You are a helpful assistant.',
- from: 'explicit'
+ name: "customer-support",
+ content: "You are a helpful assistant.",
+ from: "explicit",
});
// Latest mode - require optimized version
const prompt = await ze.prompt({
- name: 'customer-support',
- from: 'latest'
+ name: "customer-support",
+ from: "latest",
});
// Hash mode - specific version
const prompt = await ze.prompt({
- name: 'customer-support',
- from: 'a1b2c3d4e5f6...' // 64-char SHA-256 hash
+ name: "customer-support",
+ from: "a1b2c3d4e5f6...", // 64-char SHA-256 hash
});
```
@@ -258,11 +254,11 @@ Returns a decorated prompt string with metadata header used by integrations:
#### Errors
-| Error | When |
-| --- | --- |
-| `Error` | Both `content` and `from` provided (except `from: "explicit"`), or neither |
-| `PromptRequestError` | `from: "latest"` but no versions exist |
-| `PromptNotFoundError` | `from` is a hash that does not exist |
+| Error | When |
+| --------------------- | -------------------------------------------------------------------------- |
+| `Error` | Both `content` and `from` provided (except `from: "explicit"`), or neither |
+| `PromptRequestError` | `from: "latest"` but no versions exist |
+| `PromptNotFoundError` | `from` is a hash that does not exist |
---
@@ -271,114 +267,36 @@ Returns a decorated prompt string with metadata header used by integrations:
Sends feedback for a completion to enable prompt optimization.
```typescript
-async function sendFeedback(options: SendFeedbackOptions): Promise
+async function sendFeedback(
+ options: SendFeedbackOptions,
+): Promise;
```
#### SendFeedbackOptions
-| Option | Type | Required | Description |
-| --- | --- | --- | --- |
-| `promptSlug` | `string` | Yes | The slug of the prompt (task name) |
-| `completionId` | `string` | Yes | UUID of the span/completion |
-| `thumbsUp` | `boolean` | Yes | `true` for positive, `false` for negative |
-| `reason` | `string` | No | Explanation of the feedback |
-| `expectedOutput` | `string` | No | What the expected output should be |
-| `metadata` | `Record` | No | Additional metadata |
-| `judgeId` | `string` | No | Judge automation ID for judge feedback |
-| `expectedScore` | `number` | No | Expected score for scored judges |
-| `scoreDirection` | `'too_high' \| 'too_low'` | No | Score direction for scored judges |
+| Option | Type | Required | Description |
+| ---------------- | ------------------------- | -------- | ----------------------------------------- |
+| `promptSlug` | `string` | Yes | The slug of the prompt (task name) |
+| `completionId` | `string` | Yes | UUID of the span/completion |
+| `thumbsUp` | `boolean` | Yes | `true` for positive, `false` for negative |
+| `reason` | `string` | No | Explanation of the feedback |
+| `expectedOutput` | `string` | No | What the expected output should be |
+| `metadata` | `Record` | No | Additional metadata |
+| `judgeId` | `string` | No | Judge automation ID for judge feedback |
+| `expectedScore` | `number` | No | Expected score for scored judges |
+| `scoreDirection` | `'too_high' \| 'too_low'` | No | Score direction for scored judges |
#### Example
```typescript
-import * as ze from 'zeroeval';
+import * as ze from "zeroeval";
await ze.sendFeedback({
- promptSlug: 'support-bot',
- completionId: '550e8400-e29b-41d4-a716-446655440000',
+ promptSlug: "support-bot",
+ completionId: "550e8400-e29b-41d4-a716-446655440000",
thumbsUp: false,
- reason: 'Response was too verbose',
- expectedOutput: 'A concise 2-3 sentence response'
-});
-```
-
----
-
-## Signals API
-
-### `sendSignal()`
-
-Send a signal to a specific entity.
-
-```typescript
-async function sendSignal(
- entityType: 'session' | 'trace' | 'span' | 'completion',
- entityId: string,
- name: string,
- value: string | boolean | number,
- signalType?: 'boolean' | 'numerical'
-): Promise
-```
-
-### `sendTraceSignal()`
-
-Send a signal to the current trace.
-
-```typescript
-function sendTraceSignal(
- name: string,
- value: string | boolean | number,
- signalType?: 'boolean' | 'numerical'
-): void
-```
-
-### `sendSessionSignal()`
-
-Send a signal to the current session.
-
-```typescript
-function sendSessionSignal(
- name: string,
- value: string | boolean | number,
- signalType?: 'boolean' | 'numerical'
-): void
-```
-
-### `sendSpanSignal()`
-
-Send a signal to the current span.
-
-```typescript
-function sendSpanSignal(
- name: string,
- value: string | boolean | number,
- signalType?: 'boolean' | 'numerical'
-): void
-```
-
-### `getEntitySignals()`
-
-Retrieve signals for a specific entity.
-
-```typescript
-async function getEntitySignals(
- entityType: 'session' | 'trace' | 'span' | 'completion',
- entityId: string
-): Promise
-```
-
-#### Example
-
-```typescript
-import * as ze from 'zeroeval';
-
-await ze.withSpan({ name: 'process-request' }, async () => {
- // Process something...
-
- // Send signals
- ze.sendSpanSignal('success', true);
- ze.sendSpanSignal('latency_ms', 150);
- ze.sendTraceSignal('user_satisfied', true);
+ reason: "Response was too verbose",
+ expectedOutput: "A concise 2-3 sentence response",
});
```
@@ -394,8 +312,8 @@ Render a template string with variable substitution.
function renderTemplate(
template: string,
variables: Record,
- options?: { ignoreMissing?: boolean }
-): string
+ options?: { ignoreMissing?: boolean },
+): string;
```
### `extractVariables()`
@@ -403,7 +321,7 @@ function renderTemplate(
Extract variable names from a template string.
```typescript
-function extractVariables(template: string): Set
+function extractVariables(template: string): Set;
```
### `sha256Hex()`
@@ -411,7 +329,7 @@ function extractVariables(template: string): Set
Compute SHA-256 hash of text.
```typescript
-async function sha256Hex(text: string): Promise
+async function sha256Hex(text: string): Promise;
```
### `normalizePromptText()`
@@ -419,7 +337,7 @@ async function sha256Hex(text: string): Promise
Normalize prompt text for consistent hashing.
```typescript
-function normalizePromptText(text: string): string
+function normalizePromptText(text: string): string;
```
---
@@ -432,7 +350,7 @@ Thrown when a specific prompt version (by hash) is not found.
```typescript
class PromptNotFoundError extends Error {
- constructor(message: string)
+ constructor(message: string);
}
```
@@ -442,7 +360,7 @@ Thrown when a prompt request fails (e.g., no versions exist for `from: "latest"`
```typescript
class PromptRequestError extends Error {
- constructor(message: string, statusCode?: number)
+ constructor(message: string, statusCode?: number);
}
```
@@ -476,15 +394,56 @@ interface PromptMetadata {
}
```
-### `Signal`
+## Environment Variables
+
+Set before importing ZeroEval to configure default behavior.
+
+| Variable | Type | Default | Description |
+| -------------------------------- | ------- | ---------------------------- | --------------------------------------- |
+| `ZEROEVAL_API_KEY` | string | `""` | API key for authentication |
+| `ZEROEVAL_API_URL` | string | `"https://api.zeroeval.com"` | API endpoint URL |
+| `ZEROEVAL_WORKSPACE_NAME` | string | `"Personal Workspace"` | Workspace name |
+| `ZEROEVAL_SESSION_ID` | string | auto-generated | Session ID for grouping traces |
+| `ZEROEVAL_SESSION_NAME` | string | `""` | Human-readable session name |
+| `ZEROEVAL_SAMPLING_RATE` | float | `"1.0"` | Sampling rate (0.0-1.0) |
+| `ZEROEVAL_DISABLED_INTEGRATIONS` | string | `""` | Comma-separated integrations to disable |
+| `ZEROEVAL_DEBUG` | boolean | `"false"` | Enable debug logging |
+
+```bash
+export ZEROEVAL_API_KEY="ze_1234567890abcdef"
+export ZEROEVAL_SAMPLING_RATE="0.1"
+export ZEROEVAL_DEBUG="true"
+```
+
+## Configuration Examples
+
+### Production
```typescript
-interface Signal {
- value: string | boolean | number;
- type: 'boolean' | 'numerical';
-}
+ze.init({
+ apiKey: "your_key",
+ flushInterval: 1,
+ maxSpans: 200,
+ debug: false,
+ integrations: {
+ openai: true,
+ vercelAI: true,
+ },
+});
+```
+
+### Development
+
+```typescript
+ze.init({
+ apiKey: "your_key",
+ debug: true,
+ collectCodeDetails: true,
+});
```
- Need help? Check out our [GitHub examples](https://github.com/zeroeval/zeroeval-ts-sdk/tree/main/examples) or reach out on [Discord](https://discord.gg/MuExkGMNVz).
+ Need help? Check out our [GitHub
+ examples](https://github.com/zeroeval/zeroeval-ts-sdk/tree/main/examples) or
+ reach out on [Discord](https://discord.gg/MuExkGMNVz).
diff --git a/tracing/sdks/typescript/setup.mdx b/tracing/sdks/typescript/setup.mdx
index 69dc9b1..8725ef3 100644
--- a/tracing/sdks/typescript/setup.mdx
+++ b/tracing/sdks/typescript/setup.mdx
@@ -138,7 +138,9 @@ When using runtime tools like `tsx` or `ts-node`, pass the `--experimental-decor
## Sessions
-Group related spans into sessions:
+Sessions group related spans together for tracking workflows, user interactions, or multi-step processes.
+
+### Basic Session
```typescript
import { v4 as uuidv4 } from 'uuid';
@@ -163,6 +165,34 @@ async function userJourney(userId: string) {
}
```
+### Multi-Step Pipeline
+
+```typescript
+async function ragPipeline(query: string) {
+ const sessionId = uuidv4();
+
+ return ze.withSpan(
+ { name: 'rag_pipeline', sessionId, sessionName: 'RAG Query' },
+ async () => {
+ const docs = await ze.withSpan(
+ { name: 'retrieve' },
+ () => vectorSearch(query)
+ );
+
+ const ranked = await ze.withSpan(
+ { name: 'rerank' },
+ () => rerankDocs(query, docs)
+ );
+
+ return ze.withSpan(
+ { name: 'generate' },
+ () => generateAnswer(query, ranked)
+ );
+ }
+ );
+}
+```
+
## Context
Access current context information:
@@ -203,6 +233,10 @@ if (span) {
}
```
+## Feedback
+
+To attach human or programmatic feedback to completions, see [Human Feedback](/feedback/human-feedback) and the [Feedback SDK docs](/feedback/typescript). For automated quality evaluations, see [Judges](/judges/introduction).
+
## Advanced Configuration
Fine-tune the SDK behavior:
diff --git a/tracing/sessions.mdx b/tracing/sessions.mdx
deleted file mode 100644
index 73f5ce7..0000000
--- a/tracing/sessions.mdx
+++ /dev/null
@@ -1,202 +0,0 @@
----
-title: Sessions
-description: Group related spans into sessions for better organization and analysis
----
-
-Sessions provide a powerful way to group related spans together, making it easier to track and analyze complex workflows, user interactions, or multi-step processes. This guide covers everything you need to know about working with sessions.
-
-For complete API documentation, see the [Python SDK Reference](/tracing/sdks/python/reference).
-
-## Creating Sessions
-
-### Basic Session with ID
-
-The simplest way to create a session is by providing a session ID:
-
-```python
-import uuid
-import zeroeval as ze
-
-# Generate a unique session ID
-session_id = str(uuid.uuid4())
-
-@ze.span(name="process_request", session=session_id)
-def process_request(data):
- # This span belongs to the session
- return transform_data(data)
-```
-
-### Named Sessions
-
-For better organization in the ZeroEval dashboard, you can provide both an ID and a descriptive name:
-
-```python
-@ze.span(
- name="user_interaction",
- session={
- "id": session_id,
- "name": "Customer Support Chat - User #12345"
- }
-)
-def handle_support_chat(user_id, message):
- # Process the support request
- return generate_response(message)
-```
-
-## Session Inheritance
-
-Child spans automatically inherit the session from their parent span:
-
-```python
-session_info = {
- "id": str(uuid.uuid4()),
- "name": "Order Processing Pipeline"
-}
-
-@ze.span(name="process_order", session=session_info)
-def process_order(order_id):
- # These nested calls automatically belong to the same session
- validate_order(order_id)
- charge_payment(order_id)
- fulfill_order(order_id)
-
-@ze.span(name="validate_order")
-def validate_order(order_id):
- # Automatically part of the parent's session
- return check_inventory(order_id)
-
-@ze.span(name="charge_payment")
-def charge_payment(order_id):
- # Also inherits the session
- return process_payment(order_id)
-```
-
-## Advanced Session Patterns
-
-### Multi-Agent RAG System
-
-Track complex retrieval-augmented generation workflows with multiple specialized agents:
-
-```python
-session = {
- "id": str(uuid.uuid4()),
- "name": "Multi-Agent RAG Pipeline"
-}
-
-@ze.span(name="rag_coordinator", session=session)
-async def process_query(query):
- # Retrieval
- docs = await retrieval_agent(query)
-
- # Reranking
- ranked = await reranking_agent(query, docs)
-
- # Generation
- response = await generation_agent(query, ranked)
-
- return response
-
-@ze.span(name="retrieval_agent")
-async def retrieval_agent(query):
- # Inherits session from parent
- embeddings = await embed(query)
- return await vector_search(embeddings)
-
-@ze.span(name="generation_agent")
-async def generation_agent(query, context):
- return await llm.generate(query, context)
-```
-
-### Conversational AI Session
-
-Track a complete conversation with an AI assistant:
-
-```python
-class ChatSession:
- def __init__(self, user_id):
- self.session = {
- "id": f"chat-{user_id}-{uuid.uuid4()}",
- "name": f"AI Chat - User {user_id}"
- }
- self.history = []
-
- @ze.span(name="process_message", session=lambda self: self.session)
- async def process_message(self, message):
- # Add to history
- self.history.append({"role": "user", "content": message})
-
- # Generate response
- response = await self.generate_response()
- self.history.append({"role": "assistant", "content": response})
-
- return response
-
- @ze.span(name="generate_response", session=lambda self: self.session)
- async def generate_response(self):
- return await llm.chat(self.history)
-```
-
-### Batch LLM Processing
-
-Process multiple documents with LLMs in a single session:
-
-```python
-async def batch_summarize(documents):
- session = {
- "id": f"batch-{uuid.uuid4()}",
- "name": f"Batch Summarization - {len(documents)} docs"
- }
-
- @ze.span(name="batch_processor", session=session)
- async def process():
- summaries = []
-
- for i, doc in enumerate(documents):
- with ze.span(name=f"summarize_doc_{i}", session=session) as span:
- try:
- summary = await llm.summarize(doc)
- span.set_io(
- input_data=f"Doc: {doc['title']}",
- output_data=summary[:100]
- )
- summaries.append(summary)
- except Exception as e:
- span.set_error(
- code=type(e).__name__,
- message=str(e)
- )
-
- return summaries
-
- return await process()
-```
-
-## Context Manager Sessions
-
-You can also use sessions with the context manager pattern:
-
-```python
-session_info = {
- "id": str(uuid.uuid4()),
- "name": "Data Pipeline Run"
-}
-
-with ze.span(name="etl_pipeline", session=session_info) as pipeline_span:
- # Extract phase
- with ze.span(name="extract_data") as extract_span:
- raw_data = fetch_from_source()
- extract_span.set_io(output_data=f"Extracted {len(raw_data)} records")
-
- # Transform phase
- with ze.span(name="transform_data") as transform_span:
- clean_data = transform_records(raw_data)
- transform_span.set_io(
- input_data=f"{len(raw_data)} raw records",
- output_data=f"{len(clean_data)} clean records"
- )
-
- # Load phase
- with ze.span(name="load_data") as load_span:
- result = save_to_destination(clean_data)
- load_span.set_io(output_data=f"Loaded to {result['location']}")
-```
\ No newline at end of file
diff --git a/tracing/signals.mdx b/tracing/signals.mdx
deleted file mode 100644
index b8b4aef..0000000
--- a/tracing/signals.mdx
+++ /dev/null
@@ -1,199 +0,0 @@
----
-title: "Signals"
-description: "Capture real-world feedback and metrics to enrich your traces, spans, and sessions."
----
-
-Signals are any piece of user feedback, behavior, or metric you care about – thumbs-up, a 5-star rating, dwell time, task completion, error rates … you name it. Signals help you understand how your AI system performs in the real world by connecting user outcomes to your traces.
-
-You can attach signals to:
-
-- **Completions** (LLM responses)
-- **Spans** (individual operations)
-- **Sessions** (user interactions)
-- **Traces** (entire request flows)
-
-For complete signals API documentation, see the [Python SDK Reference](/tracing/sdks/python/reference#signals).
-
-## Using signals in code
-
-### With the Python SDK
-
-```python
-import zeroeval as ze
-
-# Initialize the tracer
-ze.init(api_key="your-api-key")
-
-# Start a span and add a signal
-with ze.trace("user_query") as span:
- # Your AI logic here
- response = process_user_query(query)
-
- # Add a signal to the current span
- ze.set_signal("user_satisfaction", True)
- ze.set_signal("response_quality", 4.5)
- ze.set_signal("task_completed", "success")
-```
-
-### Setting signals on different targets
-
-```python
-# On the current span
-ze.set_signal("helpful", True)
-
-# On a specific span
-span = ze.current_span()
-ze.set_signal(span, {"rating": 5, "category": "excellent"})
-
-# On the current trace
-ze.set_trace_signal("conversion", True)
-
-# On the current session
-ze.set_session_signal("user_engaged", True)
-```
-
-## API endpoint
-
-For direct API calls, send signals to:
-
-```
-POST https://api.zeroeval.com/workspaces//signals
-```
-
-Auth is the same bearer API key you use for tracing.
-
-### Payload schema
-
-| field | type | required | notes |
-| ------------- | ------------------------------ | -------- | ---------------------------------------------- |
-| completion_id | string | ❌ | **OpenAI completion ID** (for LLM completions) |
-| span_id | string | ❌ | **Span ID** (for specific spans) |
-| trace_id | string | ❌ | **Trace ID** (for entire traces) |
-| session_id | string | ❌ | **Session ID** (for user sessions) |
-| name | string | ✅ | e.g. `user_satisfaction` |
-| value | string \| bool \| int \| float | ✅ | your data – see examples below |
-
-
- You must provide at least one of: `completion_id`, `span_id`, `trace_id`, or
- `session_id`.
-
-
-## Common signal patterns
-
-Below are some quick copy-pasta snippets for the most common cases.
-
-### 1. Binary feedback (👍 / 👎)
-
-
-
-```python Python SDK
-import zeroeval as ze
-
-# On current span
-ze.set_signal("thumbs_up", True)
-
-# On specific span
-ze.set_signal(span, {"helpful": False})
-```
-
-```python API
-import requests
-
-payload = {
- "span_id": span.id,
- "name": "thumbs_up",
- "value": True // or False
-}
-requests.post(
- f"https://api.zeroeval.com/workspaces/{WORKSPACE_ID}/signals",
- json=payload,
- headers={"Authorization": f"Bearer {ZE_API_KEY}"}
-)
-```
-
-
-
-### 2. Star rating (1–5)
-
-```python
-ze.set_signal("star_rating", 4)
-```
-
-### 3. Continuous metrics
-
-```python
-# Response time
-ze.set_signal("response_time_ms", 1250.5)
-
-# Task completion time
-ze.set_signal("time_on_task_sec", 12.85)
-
-# Accuracy score
-ze.set_signal("accuracy", 0.94)
-```
-
-### 4. Categorical outcomes
-
-```python
-ze.set_signal("task_status", "success")
-ze.set_signal("error_type", "timeout")
-ze.set_signal("user_intent", "purchase")
-```
-
-### 5. Session-level signals
-
-```python
-# Track user engagement across an entire session
-ze.set_session_signal("pages_visited", 5)
-ze.set_session_signal("converted", True)
-ze.set_session_signal("user_tier", "premium")
-```
-
-### 6. Trace-level signals
-
-```python
-# Track outcomes for an entire request flow
-ze.set_trace_signal("request_successful", True)
-ze.set_trace_signal("total_cost", 0.045)
-ze.set_trace_signal("model_used", "gpt-4o")
-```
-
-## Signal types
-
-Signals are automatically categorized based on their values:
-
-- **Boolean**: `true`/`false` values → useful for success/failure, yes/no feedback
-- **Numerical**: integers and floats → useful for ratings, scores, durations, costs
-- **Categorical**: strings → useful for status, categories, error types
-
-## Putting it all together
-
-```python
-import zeroeval as ze
-
-# Initialize tracing
-ze.init(api_key="your-api-key")
-
-# Start a session for user interaction
-with ze.trace("user_chat_session", session_name="Customer Support") as session:
-
- # Process user query
- with ze.trace("process_query") as span:
- response = llm_client.chat.completions.create(...)
-
- # Signal on the LLM completion
- ze.set_signal("response_generated", True)
- ze.set_signal("response_length", len(response.choices[0].message.content))
-
- # Capture user feedback
- user_rating = get_user_feedback() # Your feedback collection logic
-
- # Signal on the session
- ze.set_session_signal("user_rating", user_rating)
- ze.set_session_signal("issue_resolved", user_rating >= 4)
-
- # Signal on the entire trace
- ze.set_trace_signal("interaction_complete", True)
-```
-
-That's it! Your signals will appear in the ZeroEval dashboard, helping you understand how your AI system performs in real-world scenarios.
diff --git a/tracing/tagging.mdx b/tracing/tagging.mdx
deleted file mode 100644
index 4989225..0000000
--- a/tracing/tagging.mdx
+++ /dev/null
@@ -1,93 +0,0 @@
----
-title: Tags
-description: Simple ways to attach rich, query-able tags to your traces.
----
-
-Tags are key–value pairs that can be attached to any **span**, **trace**, or **session**. They power the facet filters in the console so you can slice-and-dice your telemetry by *user*, *plan*, *model*, *tenant*, or anything else that matters to your business.
-
-For complete tagging API documentation, see the [Python SDK Reference](/tracing/sdks/python/reference#tags).
-
-## 1. Tag once, inherit everywhere
-
-When you add a `tags` dictionary to the **first** span you create, every child span automatically gets the same tags. That means you set them once and they flow down the entire call-stack.
-
-```python
-import zeroeval as ze
-
-@ze.span(
- name="handle_request",
- tags={
- "user_id": "42", # who triggered the request
- "tenant": "acme-corp", # multi-tenant identifier
- "plan": "enterprise" # commercial plan
- }
-)
-def handle_request():
- authenticate()
- fetch_data()
- process()
-
- # Two nested child spans – they automatically inherit *all* the tags
- with ze.span(name="fetch_data"):
- ...
-
- with ze.span(name="process", tags={"stage": "post"}):
- ...
-```
-
-
-
-## 2. Tag a single span
-
-If you want to tag only a **single** span (or override a tag inherited from a parent) simply provide the `tags` argument on that specific decorator or context manager.
-
-```python
-import zeroeval as ze
-
-@ze.span(name="top_level")
-def top_level():
- # Child span with its own tags – *not* inherited by siblings
- with ze.span(name="db_call", tags={"table": "customers", "operation": "SELECT"}):
- query_database()
-
- # Another child span without tags – it has no knowledge of the db_call tags
- with ze.span(name="render"):
- render_template()
-```
-
-Under the hood these tags live only on that single span, they are **not** copied to siblings or parents.
-
-## 3. Granular tagging (session, trace, or span)
-
-You can add granular tags at the session, trace, or span level after they've been created:
-
-```python
-import uuid
-from langchain_core.messages import HumanMessage
-import zeroeval as ze
-
-DEMO_TAGS = {"example": "langgraph_tags_demo", "project": "zeroeval"}
-
-SESSION_ID = str(uuid.uuid4())
-SESSION_INFO = {"id": SESSION_ID, "name": "Tags Demo Session"}
-
-with ze.span(
- name="demo.root_invoke",
- session=SESSION_INFO,
- tags={**DEMO_TAGS, "run": "invoke"},
-):
- # 1️⃣ Tag the *current* span only
- current_span = ze.get_current_span()
- ze.set_tag(current_span, {"phase": "pre-run"})
-
- # 2️⃣ Tag the whole trace – root + all children (past *and* future)
- current_trace = ze.get_current_trace()
- ze.set_tag(current_trace, {"run_mode": "invoke"})
-
- # 3️⃣ Tag the entire session
- current_session = ze.get_current_session()
- ze.set_tag(current_session, {"env": "local"})
-
- result = app.invoke({"messages": [HumanMessage(content="hello")], "count": 0})
-```
-