|
1 | 1 | --- |
2 | 2 | title: "CLI" |
3 | | -description: "Manage ZeroEval from your terminal" |
| 3 | +description: "Manage traces, prompts, judges, and optimization from your terminal" |
4 | 4 | --- |
5 | 5 |
|
| 6 | +The ZeroEval CLI ships with the Python SDK and gives you terminal access to monitoring, prompts, judges, and optimization. It is designed to be agent-friendly and works well in CI pipelines, scripted workflows, and agent toolchains. |
| 7 | + |
| 8 | +```bash |
| 9 | +pip install zeroeval |
| 10 | +``` |
| 11 | + |
| 12 | +## Setup |
| 13 | + |
| 14 | +Run the interactive setup to save your API key and resolve your project: |
| 15 | + |
| 16 | +```bash |
| 17 | +zeroeval setup |
| 18 | +``` |
| 19 | + |
| 20 | +This opens the ZeroEval dashboard, prompts for your API key, saves it to your shell config (e.g. `~/.zshrc`, `~/.bashrc`), and links your project automatically. |
| 21 | + |
| 22 | +<Tip> |
| 23 | +For non-interactive environments like CI or coding agents, use `auth set` instead: |
| 24 | + |
| 25 | +```bash |
| 26 | +zeroeval auth set --api-key-env ZEROEVAL_API_KEY |
| 27 | +``` |
| 28 | +</Tip> |
| 29 | + |
| 30 | +### Auth commands |
| 31 | + |
| 32 | +```bash |
| 33 | +zeroeval auth set --api-key <key> # Set API key directly |
| 34 | +zeroeval auth set --api-key-env MY_KEY_VAR # Read API key from an env var |
| 35 | +zeroeval auth set --api-base-url <url> # Override API base URL |
| 36 | +zeroeval auth show --redact # Show current config (key masked) |
| 37 | +zeroeval auth clear # Wipe all stored config |
| 38 | +zeroeval auth clear --api-key-only # Clear only the API key and project |
| 39 | +``` |
| 40 | + |
| 41 | +### Auth resolution |
| 42 | + |
| 43 | +The CLI resolves credentials in this order: |
| 44 | + |
| 45 | +1. Explicit CLI flags (`--project-id`, `--api-base-url`) |
| 46 | +2. Environment variables (`ZEROEVAL_API_KEY`, `ZEROEVAL_PROJECT_ID`, `ZEROEVAL_BASE_URL`) |
| 47 | +3. Global CLI config file |
| 48 | + |
| 49 | +Config file location: |
| 50 | +- **macOS / Linux**: `~/.config/zeroeval/config.json` (or `$XDG_CONFIG_HOME/zeroeval/config.json`) |
| 51 | +- **Windows**: `%APPDATA%/zeroeval/config.json` |
| 52 | + |
| 53 | +## Global flags |
| 54 | + |
| 55 | +Global flags must appear **before** the subcommand: |
| 56 | + |
| 57 | +```bash |
| 58 | +zeroeval --output json --project-id <id> --timeout 30.0 traces list |
| 59 | +``` |
| 60 | + |
| 61 | +| Flag | Default | Description | |
| 62 | +|------|---------|-------------| |
| 63 | +| `--output text\|json` | `text` | Output format. `json` emits stable JSON to stdout, errors to stderr. | |
| 64 | +| `--project-id` | env / config | Project context. Required for monitoring, prompts, judges, and optimization commands. | |
| 65 | +| `--api-base-url` | `https://api.zeroeval.com` | Override the API URL. | |
| 66 | +| `--quiet` | off | Suppress non-essential output. | |
| 67 | +| `--timeout` | `20.0` | HTTP request timeout in seconds. | |
| 68 | + |
| 69 | +### Output modes |
| 70 | + |
| 71 | +- **`text`** (default) — human-readable; dict/list payloads are pretty-printed as JSON. |
| 72 | +- **`json`** — stable, machine-readable JSON to stdout. Errors go to stderr as structured JSON. Confirmation prompts (e.g. `optimize promote`) are auto-skipped in JSON mode. |
| 73 | + |
| 74 | +### Exit codes |
| 75 | + |
| 76 | +| Code | Meaning | |
| 77 | +|------|---------| |
| 78 | +| `0` | Success | |
| 79 | +| `2` | User / validation error | |
| 80 | +| `3` | Auth or permission error | |
| 81 | +| `4` | Remote API or network error | |
| 82 | + |
| 83 | +## Querying and filtering |
| 84 | + |
| 85 | +Most list and get commands support `--where`, `--select`, and `--order` for client-side filtering, projection, and sorting. |
| 86 | + |
| 87 | +### `--where` |
| 88 | + |
| 89 | +Filter rows. Repeatable — multiple clauses are AND-ed. |
| 90 | + |
| 91 | +```bash |
| 92 | +# Exact match |
| 93 | +zeroeval judges list --where "name=Quality Check" |
| 94 | + |
| 95 | +# Substring match (case-insensitive) |
| 96 | +zeroeval traces list --where "status~completed" |
| 97 | + |
| 98 | +# Set membership |
| 99 | +zeroeval spans list --where 'kind in ["llm","tool"]' |
| 100 | +``` |
| 101 | + |
| 102 | +Supported operators: `=` (exact), `~` (substring), `in` (JSON array). |
| 103 | + |
| 104 | +### `--select` |
| 105 | + |
| 106 | +Project specific fields. Comma-separated, supports dotted paths. |
| 107 | + |
| 108 | +```bash |
| 109 | +zeroeval judges list --select "id,name,evaluation_type" |
| 110 | +``` |
| 111 | + |
| 112 | +### `--order` |
| 113 | + |
| 114 | +Sort results by a field. Defaults to ascending. |
| 115 | + |
| 116 | +```bash |
| 117 | +zeroeval traces list --order "created_at:desc" |
| 118 | +``` |
| 119 | + |
| 120 | +## Monitoring |
| 121 | + |
| 122 | +Monitoring commands require a `--project-id` (resolved automatically after `zeroeval setup`). |
| 123 | + |
| 124 | +### Sessions |
| 125 | + |
| 126 | +```bash |
| 127 | +zeroeval sessions list --start-date 2025-01-01 --end-date 2025-02-01 --limit 50 |
| 128 | +zeroeval sessions get <session_id> |
| 129 | +``` |
| 130 | + |
| 131 | +| Flag | Description | |
| 132 | +|------|-------------| |
| 133 | +| `--start-date` | ISO date string lower bound | |
| 134 | +| `--end-date` | ISO date string upper bound | |
| 135 | +| `--limit` | Max results (default 50) | |
| 136 | +| `--offset` | Pagination offset (default 0) | |
| 137 | + |
| 138 | +### Traces |
| 139 | + |
| 140 | +```bash |
| 141 | +zeroeval traces list --start-date 2025-01-01 --limit 50 |
| 142 | +zeroeval traces get <trace_id> |
| 143 | +zeroeval traces spans <trace_id> --limit 100 |
| 144 | +``` |
| 145 | + |
| 146 | +`traces spans` returns the spans belonging to a specific trace, useful for debugging individual requests. |
| 147 | + |
| 148 | +### Spans |
| 149 | + |
| 150 | +```bash |
| 151 | +zeroeval spans list --start-date 2025-01-01 --limit 50 |
| 152 | +zeroeval spans get <span_id> |
| 153 | +``` |
| 154 | + |
| 155 | +## Prompts |
| 156 | + |
| 157 | +### List and inspect |
| 158 | + |
| 159 | +```bash |
| 160 | +zeroeval prompts list |
| 161 | +zeroeval prompts get <prompt_slug> |
| 162 | +zeroeval prompts get <prompt_slug> --version 3 |
| 163 | +zeroeval prompts get <prompt_slug> --tag production |
| 164 | +zeroeval prompts versions <prompt_slug> |
| 165 | +zeroeval prompts tags <prompt_slug> |
| 166 | +``` |
| 167 | + |
| 168 | +### Submit feedback |
| 169 | + |
| 170 | +Provide feedback on a prompt completion for DSPy optimization and prompt tuning: |
| 171 | + |
| 172 | +```bash |
| 173 | +zeroeval prompts feedback create \ |
| 174 | + --prompt-slug customer-support \ |
| 175 | + --completion-id <uuid> \ |
| 176 | + --thumbs-up \ |
| 177 | + --reason "Clear and helpful response" |
| 178 | +``` |
| 179 | + |
| 180 | +For scored judges, add judge-specific fields: |
| 181 | + |
| 182 | +```bash |
| 183 | +zeroeval prompts feedback create \ |
| 184 | + --prompt-slug customer-support \ |
| 185 | + --completion-id <uuid> \ |
| 186 | + --thumbs-down \ |
| 187 | + --judge-id <judge_uuid> \ |
| 188 | + --expected-score 3.5 \ |
| 189 | + --score-direction too_high \ |
| 190 | + --reason "Score should be lower" |
| 191 | +``` |
| 192 | + |
6 | 193 | <Note> |
7 | | -The CLI is coming soon. |
| 194 | +`--thumbs-up` and `--thumbs-down` are mutually exclusive and one is required. |
8 | 195 | </Note> |
9 | 196 |
|
10 | | -We're building a CLI for managing traces, prompts, and judges from the terminal. Useful for CI/CD pipelines, scripted workflows, and quick lookups when you don't want to open the console. |
| 197 | +## Judges |
| 198 | + |
| 199 | +### List and inspect |
| 200 | + |
| 201 | +```bash |
| 202 | +zeroeval judges list |
| 203 | +zeroeval judges get <judge_id> |
| 204 | +zeroeval judges criteria <judge_id> |
| 205 | +zeroeval judges evaluations <judge_id> --limit 100 |
| 206 | +zeroeval judges insights <judge_id> |
| 207 | +zeroeval judges performance <judge_id> |
| 208 | +zeroeval judges calibration <judge_id> |
| 209 | +zeroeval judges versions <judge_id> |
| 210 | +``` |
| 211 | + |
| 212 | +### Filter evaluations |
| 213 | + |
| 214 | +```bash |
| 215 | +zeroeval judges evaluations <judge_id> \ |
| 216 | + --start-date 2025-01-01 \ |
| 217 | + --end-date 2025-02-01 \ |
| 218 | + --evaluation-result true \ |
| 219 | + --feedback-state pending \ |
| 220 | + --limit 200 |
| 221 | +``` |
| 222 | + |
| 223 | +| Flag | Description | |
| 224 | +|------|-------------| |
| 225 | +| `--evaluation-result` | `true` or `false` | |
| 226 | +| `--feedback-state` | Filter by feedback state | |
| 227 | +| `--start-date` / `--end-date` | Date range | |
| 228 | + |
| 229 | +### Create a judge |
| 230 | + |
| 231 | +```bash |
| 232 | +zeroeval judges create \ |
| 233 | + --name "Tone Check" \ |
| 234 | + --prompt "Evaluate whether the response maintains a professional tone." \ |
| 235 | + --evaluation-type binary \ |
| 236 | + --sample-rate 1.0 \ |
| 237 | + --temperature 0.0 |
| 238 | +``` |
| 239 | + |
| 240 | +Or load the prompt from a file: |
| 241 | + |
| 242 | +```bash |
| 243 | +zeroeval judges create \ |
| 244 | + --name "Quality Scorer" \ |
| 245 | + --prompt-file judge_prompt.txt \ |
| 246 | + --evaluation-type scored \ |
| 247 | + --score-min 0 \ |
| 248 | + --score-max 10 \ |
| 249 | + --pass-threshold 7 |
| 250 | +``` |
| 251 | + |
| 252 | +| Flag | Default | Description | |
| 253 | +|------|---------|-------------| |
| 254 | +| `--name` | required | Judge name | |
| 255 | +| `--prompt` | — | Inline prompt text (mutually exclusive with `--prompt-file`) | |
| 256 | +| `--prompt-file` | — | Path to file containing the prompt | |
| 257 | +| `--evaluation-type` | `binary` | `binary` or `scored` | |
| 258 | +| `--score-min` | `0.0` | Minimum score (scored only) | |
| 259 | +| `--score-max` | `10.0` | Maximum score (scored only) | |
| 260 | +| `--pass-threshold` | — | Pass threshold (scored only) | |
| 261 | +| `--sample-rate` | `1.0` | Fraction of spans to evaluate | |
| 262 | +| `--backfill` | `100` | Number of existing spans to backfill | |
| 263 | +| `--tag` | — | Tag filter in `key=value1,value2` format. Repeatable. | |
| 264 | +| `--tag-match` | `all` | `all` or `any` | |
| 265 | +| `--target-prompt-id` | — | Scope judge to a specific prompt | |
| 266 | +| `--temperature` | `0.0` | LLM temperature for judge | |
| 267 | + |
| 268 | +### Submit judge feedback |
| 269 | + |
| 270 | +```bash |
| 271 | +zeroeval judges feedback create \ |
| 272 | + --span-id <span_uuid> \ |
| 273 | + --thumbs-down \ |
| 274 | + --reason "Missed safety issue" \ |
| 275 | + --expected-output "Should flag harmful content" |
| 276 | +``` |
| 277 | + |
| 278 | +For scored judges with per-criterion feedback: |
| 279 | + |
| 280 | +```bash |
| 281 | +zeroeval judges feedback create \ |
| 282 | + --span-id <span_uuid> \ |
| 283 | + --thumbs-down \ |
| 284 | + --expected-score 2.0 \ |
| 285 | + --score-direction too_high \ |
| 286 | + --criteria-feedback '{"clarity": {"expected_score": 1.0, "reason": "Confusing response"}}' |
| 287 | +``` |
| 288 | + |
| 289 | +## Optimization |
| 290 | + |
| 291 | +Start, inspect, and promote prompt or judge optimization runs. All optimization commands require `--project-id`. |
| 292 | + |
| 293 | +### Prompt optimization |
| 294 | + |
| 295 | +```bash |
| 296 | +zeroeval optimize prompt list <task_id> |
| 297 | +zeroeval optimize prompt get <task_id> <run_id> |
| 298 | +zeroeval optimize prompt start <task_id> --optimizer-type quick_refine |
| 299 | +zeroeval optimize prompt promote <task_id> <run_id> --yes |
| 300 | +``` |
| 301 | + |
| 302 | +### Judge optimization |
| 303 | + |
| 304 | +```bash |
| 305 | +zeroeval optimize judge list <judge_id> |
| 306 | +zeroeval optimize judge get <judge_id> <run_id> |
| 307 | +zeroeval optimize judge start <judge_id> --optimizer-type dspy_bootstrap |
| 308 | +zeroeval optimize judge promote <judge_id> <run_id> --yes |
| 309 | +``` |
| 310 | + |
| 311 | +| Flag | Default | Description | |
| 312 | +|------|---------|-------------| |
| 313 | +| `--optimizer-type` | `quick_refine` | `quick_refine`, `dspy_bootstrap`, or `dspy_gepa` | |
| 314 | +| `--config` | — | JSON string of extra optimizer configuration | |
| 315 | +| `--yes` | off | Skip the confirmation prompt (also skipped in `--output json` mode) | |
| 316 | + |
| 317 | +## Spec (machine-readable manual) |
| 318 | + |
| 319 | +The `spec` commands dump the CLI's command and parameter contract as JSON or Markdown, useful for agents and toolchains that need to discover available commands programmatically. |
| 320 | + |
| 321 | +```bash |
| 322 | +zeroeval spec cli --format json |
| 323 | +zeroeval spec command "judges create" --format markdown |
| 324 | +``` |
| 325 | + |
| 326 | +## CI / automation recipes |
| 327 | + |
| 328 | +### Get the latest traces as JSON |
| 329 | + |
| 330 | +```bash |
| 331 | +zeroeval --output json traces list --limit 10 --order "created_at:desc" |
| 332 | +``` |
| 333 | + |
| 334 | +### Check judge pass rate |
| 335 | + |
| 336 | +```bash |
| 337 | +zeroeval --output json judges evaluations <judge_id> \ |
| 338 | + --evaluation-result true --limit 1000 \ |
| 339 | + --select "id" | jq length |
| 340 | +``` |
| 341 | + |
| 342 | +### Promote an optimization run non-interactively |
| 343 | + |
| 344 | +```bash |
| 345 | +zeroeval --output json optimize prompt promote <task_id> <run_id> --yes |
| 346 | +``` |
| 347 | + |
| 348 | +## Related docs |
11 | 349 |
|
12 | | -Want to know when it ships? Email [founders@zeroeval.com](mailto:founders@zeroeval.com). |
| 350 | +<CardGroup cols={2}> |
| 351 | + <Card title="Tracing quickstart" icon="rocket" href="/tracing/quickstart"> |
| 352 | + Get your first trace in under 5 minutes |
| 353 | + </Card> |
| 354 | + <Card title="Judges" icon="gavel" href="/judges/introduction"> |
| 355 | + How calibrated judges evaluate your production traffic |
| 356 | + </Card> |
| 357 | + <Card title="Prompt setup" icon="wrench" href="/autotune/setup"> |
| 358 | + Add ze.prompt() to your Python or TypeScript codebase |
| 359 | + </Card> |
| 360 | + <Card title="Skills" icon="wand-magic-sparkles" href="/integrations/skills"> |
| 361 | + Let your coding agent handle SDK install and judge setup |
| 362 | + </Card> |
| 363 | +</CardGroup> |
0 commit comments