|
1 | 1 | --- |
2 | 2 | title: "MCP" |
3 | | -description: "Model Context Protocol server for ZeroEval" |
| 3 | +description: "Connect AI agents to ZeroEval via the Model Context Protocol" |
4 | 4 | --- |
5 | 5 |
|
6 | | -<Note> |
7 | | -MCP integration is coming soon. |
8 | | -</Note> |
| 6 | +The ZeroEval MCP server lets AI agents inspect traces, manage judges and prompts, submit feedback, run optimizations, and deploy to production, all without leaving the agent context. It speaks the [Model Context Protocol](https://modelcontextprotocol.io), so any MCP-compatible client (Cursor, Claude Code, Windsurf, etc.) can connect directly. |
9 | 7 |
|
10 | | -We're building an MCP server so AI agents can query traces, manage prompts, and run evaluations through ZeroEval without leaving the agent context. |
| 8 | +## Setup |
11 | 9 |
|
12 | | -Want to know when it ships? Email [founders@zeroeval.com](mailto:founders@zeroeval.com). |
| 10 | +The fastest way to get started is to point your MCP client at the hosted server. No installation required. |
| 11 | + |
| 12 | +### Cursor |
| 13 | + |
| 14 | +Add this to your Cursor MCP settings (`.cursor/mcp.json`): |
| 15 | + |
| 16 | +```json |
| 17 | +{ |
| 18 | + "mcpServers": { |
| 19 | + "zeroeval": { |
| 20 | + "url": "https://mcp.zeroeval.com/mcp", |
| 21 | + "headers": { |
| 22 | + "Authorization": "Bearer <your-project-api-key>" |
| 23 | + } |
| 24 | + } |
| 25 | + } |
| 26 | +} |
| 27 | +``` |
| 28 | + |
| 29 | +### Claude Code |
| 30 | + |
| 31 | +```bash |
| 32 | +claude mcp add zeroeval --transport http https://mcp.zeroeval.com/mcp \ |
| 33 | + --header "Authorization: Bearer <your-project-api-key>" |
| 34 | +``` |
| 35 | + |
| 36 | +### Other MCP clients |
| 37 | + |
| 38 | +Any client that supports HTTP transport works. Set the server URL to `https://mcp.zeroeval.com/mcp` and pass your project API key in the `Authorization: Bearer <key>` header. |
| 39 | + |
| 40 | +<Tip> |
| 41 | +Get your project API key from the [ZeroEval dashboard](https://app.zeroeval.com) under **Settings → API Keys**. |
| 42 | +</Tip> |
| 43 | + |
| 44 | +## Resources |
| 45 | + |
| 46 | +The server exposes two MCP resources for introspection: |
| 47 | + |
| 48 | +| URI | Description | |
| 49 | +|-----|-------------| |
| 50 | +| `config://server-context` | Redacted server config: auth mode, base URL, project scope, and feature flags | |
| 51 | +| `docs://capabilities` | Canonical tool and resource inventory with annotations and output contract summary | |
| 52 | + |
| 53 | +## Tools |
| 54 | + |
| 55 | +### Read tools |
| 56 | + |
| 57 | +Read tools are safe to call at any time. They do not modify state. |
| 58 | + |
| 59 | +| Tool | Description | |
| 60 | +|------|-------------| |
| 61 | +| `list-traces` | List recent traces | |
| 62 | +| `get-trace` | Get a trace with its spans | |
| 63 | +| `list-judges` | List all judges | |
| 64 | +| `get-judge` | Get judge details and linkage state | |
| 65 | +| `list-judge-evaluations` | List evaluations from a judge | |
| 66 | +| `get-judge-criteria` | Get scoring criteria for a judge | |
| 67 | +| `list-prompts` | List all prompts | |
| 68 | +| `get-prompt` | Get a prompt at a specific version or tag | |
| 69 | +| `list-prompt-versions` | List all versions of a prompt | |
| 70 | +| `list-optimization-runs` | List optimization runs for a task | |
| 71 | +| `get-optimization-run` | Get run details with candidate prompt and metrics | |
| 72 | +| `get-project-summary` | High-level project monitoring summary | |
| 73 | + |
| 74 | +### Write tools |
| 75 | + |
| 76 | +All write tools require `confirm: true` in the request and are annotated with `destructiveHint: true` so MCP clients can prompt for user approval before calling. |
| 77 | + |
| 78 | +| Tool | Description | |
| 79 | +|------|-------------| |
| 80 | +| `create-judge` | Create a new judge | |
| 81 | +| `link-judge-to-prompt` | Link a judge to a prompt | |
| 82 | +| `unlink-judge-from-prompt` | Remove a judge's prompt link | |
| 83 | +| `create-judge-feedback` | Submit feedback on a judge evaluation | |
| 84 | +| `create-prompt-feedback` | Submit feedback on a prompt completion | |
| 85 | +| `start-prompt-optimization` | Start a prompt optimization run | |
| 86 | +| `start-judge-optimization` | Start a judge optimization run | |
| 87 | +| `cancel-optimization-run` | Cancel a running optimization | |
| 88 | + |
| 89 | +### Deploy |
| 90 | + |
| 91 | +Production deploys always require two steps: |
| 92 | + |
| 93 | +1. **Preview:** Call `preview-optimization-deploy` with the run ID. This verifies the run succeeded, summarizes the candidate vs current production, and returns a time-limited confirmation receipt. |
| 94 | +2. **Deploy:** Call `deploy-optimization-run` with `confirm: true` and the receipt from preview. The server re-reads current state and rejects the deploy if anything drifted since the preview. |
| 95 | + |
| 96 | +| Tool | Description | |
| 97 | +|------|-------------| |
| 98 | +| `preview-optimization-deploy` | Preview what deploying a run would do (read-only) | |
| 99 | +| `deploy-optimization-run` | Deploy a succeeded run to production (requires receipt + confirm) | |
| 100 | + |
| 101 | +### Proposal tools |
| 102 | + |
| 103 | +Proposal tools are read-only helpers that gather evidence or prepare the exact next mutating call without executing it. |
| 104 | + |
| 105 | +| Tool | Description | |
| 106 | +|------|-------------| |
| 107 | +| `investigate-prompt-issues` | Gather evidence about prompt state and recommend next steps | |
| 108 | +| `investigate-judge-issues` | Gather evidence about judge state and recommend next steps | |
| 109 | +| `prepare-prompt-optimization` | Propose the exact `start-prompt-optimization` call to make | |
| 110 | +| `prepare-judge-optimization` | Propose the exact `start-judge-optimization` call to make | |
| 111 | + |
| 112 | +## Good to know |
| 113 | + |
| 114 | +- **Single-project scope.** Each MCP connection is tied to one ZeroEval project. To work with a different project, use a different API key. |
| 115 | +- **Optimization prerequisites.** Prompts must have been used with `ze.prompt()` before optimization is available. Judges need a linked tuning task. |
| 116 | +- **Proposal tools are read-only.** The `investigate-*` and `prepare-*` tools never mutate state. They recommend the next tool call for the agent to confirm and execute. |
| 117 | + |
| 118 | +<CardGroup cols={2}> |
| 119 | + <Card title="Tracing quickstart" icon="rocket" href="/tracing/quickstart"> |
| 120 | + Get your first trace in under 5 minutes |
| 121 | + </Card> |
| 122 | + <Card title="Prompt setup" icon="wrench" href="/autotune/setup"> |
| 123 | + Add ze.prompt() to your codebase |
| 124 | + </Card> |
| 125 | + <Card title="Judges" icon="gavel" href="/judges/introduction"> |
| 126 | + How calibrated judges evaluate your production traffic |
| 127 | + </Card> |
| 128 | + <Card title="CLI" icon="terminal" href="/integrations/cli"> |
| 129 | + Manage traces, prompts, and judges from your terminal |
| 130 | + </Card> |
| 131 | +</CardGroup> |
0 commit comments