diff --git a/CLAUDE.md b/CLAUDE.md index cd489e0..64a2931 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -32,7 +32,7 @@ mintlify install | Directory | Purpose | |-----------|---------| -| `tracing/` | Monitoring & tracing guides, SDK docs (Python + TypeScript), advanced topics (sessions, tagging, signals, OTel) | +| `tracing/` | Monitoring & tracing guides, SDK docs (Python + TypeScript), advanced topics (sessions, tagging, OTel) | | `autotune/` | Prompt optimization ("Prompts" in nav), setup, model configs | | `judges/` | AI evaluation judges, setup, multimodal eval, feedback submission | | `evaluations/` | Evaluations section (currently placeholder) | diff --git a/autotune/introduction.mdx b/autotune/introduction.mdx index c10a81e..3bc41d9 100644 --- a/autotune/introduction.mdx +++ b/autotune/introduction.mdx @@ -1,36 +1,47 @@ --- title: "Introduction" -description: "Run evaluations on models and prompts to find the best variants for your agents" +description: "Version, track, and optimize every prompt your agent uses" --- -Prompt optimization is a different approach to the traditional evals experience. Instead of setting up complex eval pipelines, we simply ingest your production traces and let you optimize your prompts based on your feedback. +Prompts are the instructions that drive your agent's behavior. Small changes in wording can dramatically affect output quality, but without tracking, you have no way to know which version works best -- or even which version is running in production. + +ZeroEval Prompts gives you version control for prompts with a single function call. Every change is tracked, every completion is linked to the exact prompt version that produced it, and you can deploy optimized versions without touching code. + +## Why track prompts + +- **Version history** -- every prompt change creates a new version you can compare and roll back to +- **Production visibility** -- see exactly which prompt version is running, how often it's called, and what it produces +- **Feedback loop** -- attach thumbs-up/down feedback to completions, then use it to [optimize prompts](/autotune/prompts/prompts) and [evaluate models](/judges/introduction) +- **One-click deployments** -- push a winning prompt or model to production without redeploying your app ## How it works - - Replace hardcoded prompts with `ze.prompt()` calls in Python or `ze.prompt({...})` in TypeScript + + Swap string literals for `ze.prompt()` calls. Your existing prompt text + becomes the fallback content. - - Each time you modify your prompt content, a new version is automatically created and tracked + + Each unique prompt string creates a tracked version. Changes in your code + produce new versions without any extra work. - - ZeroEval automatically tracks all LLM interactions and their outcomes + + When your LLM integration fires, ZeroEval links each completion to the exact + prompt version and model that produced it. - - Use the UI to run experiments, vote on outputs, and identify the best prompt/model combinations - - - Winning configurations are automatically deployed to your application without code changes + + Review completions, submit feedback, and generate improved prompt variants + -- all from real traffic. +## Get started + - - Learn how to integrate ze.prompt() into your Python or TypeScript codebase + + `ze.prompt()` and `ze.get_prompt()` for Python applications - - Run experiments and deploy winning combinations + + `ze.prompt()` for TypeScript and JavaScript applications - diff --git a/autotune/prompts/models.mdx b/autotune/prompts/models.mdx deleted file mode 100644 index 4fa58a3..0000000 --- a/autotune/prompts/models.mdx +++ /dev/null @@ -1,10 +0,0 @@ ---- -title: "Models" -description: "Evaluate your agent's performance across multiple models" ---- - -