Skip to content

Commit 226ac97

Browse files
references for zeroeval cli
1 parent 98c7c66 commit 226ac97

File tree

2 files changed

+356
-5
lines changed

2 files changed

+356
-5
lines changed

integrations/cli.mdx

Lines changed: 355 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,363 @@
11
---
22
title: "CLI"
3-
description: "Manage ZeroEval from your terminal"
3+
description: "Manage traces, prompts, judges, and optimization from your terminal"
44
---
55

6+
The ZeroEval CLI ships with the Python SDK and gives you terminal access to monitoring, prompts, judges, and optimization. It is designed to be agent-friendly and works well in CI pipelines, scripted workflows, and agent toolchains.
7+
8+
```bash
9+
pip install zeroeval
10+
```
11+
12+
## Setup
13+
14+
Run the interactive setup to save your API key and resolve your project:
15+
16+
```bash
17+
zeroeval setup
18+
```
19+
20+
This opens the ZeroEval dashboard, prompts for your API key, saves it to your shell config (e.g. `~/.zshrc`, `~/.bashrc`), and links your project automatically.
21+
22+
<Tip>
23+
For non-interactive environments like CI or coding agents, use `auth set` instead:
24+
25+
```bash
26+
zeroeval auth set --api-key-env ZEROEVAL_API_KEY
27+
```
28+
</Tip>
29+
30+
### Auth commands
31+
32+
```bash
33+
zeroeval auth set --api-key <key> # Set API key directly
34+
zeroeval auth set --api-key-env MY_KEY_VAR # Read API key from an env var
35+
zeroeval auth set --api-base-url <url> # Override API base URL
36+
zeroeval auth show --redact # Show current config (key masked)
37+
zeroeval auth clear # Wipe all stored config
38+
zeroeval auth clear --api-key-only # Clear only the API key and project
39+
```
40+
41+
### Auth resolution
42+
43+
The CLI resolves credentials in this order:
44+
45+
1. Explicit CLI flags (`--project-id`, `--api-base-url`)
46+
2. Environment variables (`ZEROEVAL_API_KEY`, `ZEROEVAL_PROJECT_ID`, `ZEROEVAL_BASE_URL`)
47+
3. Global CLI config file
48+
49+
Config file location:
50+
- **macOS / Linux**: `~/.config/zeroeval/config.json` (or `$XDG_CONFIG_HOME/zeroeval/config.json`)
51+
- **Windows**: `%APPDATA%/zeroeval/config.json`
52+
53+
## Global flags
54+
55+
Global flags must appear **before** the subcommand:
56+
57+
```bash
58+
zeroeval --output json --project-id <id> --timeout 30.0 traces list
59+
```
60+
61+
| Flag | Default | Description |
62+
|------|---------|-------------|
63+
| `--output text\|json` | `text` | Output format. `json` emits stable JSON to stdout, errors to stderr. |
64+
| `--project-id` | env / config | Project context. Required for monitoring, prompts, judges, and optimization commands. |
65+
| `--api-base-url` | `https://api.zeroeval.com` | Override the API URL. |
66+
| `--quiet` | off | Suppress non-essential output. |
67+
| `--timeout` | `20.0` | HTTP request timeout in seconds. |
68+
69+
### Output modes
70+
71+
- **`text`** (default) — human-readable; dict/list payloads are pretty-printed as JSON.
72+
- **`json`** — stable, machine-readable JSON to stdout. Errors go to stderr as structured JSON. Confirmation prompts (e.g. `optimize promote`) are auto-skipped in JSON mode.
73+
74+
### Exit codes
75+
76+
| Code | Meaning |
77+
|------|---------|
78+
| `0` | Success |
79+
| `2` | User / validation error |
80+
| `3` | Auth or permission error |
81+
| `4` | Remote API or network error |
82+
83+
## Querying and filtering
84+
85+
Most list and get commands support `--where`, `--select`, and `--order` for client-side filtering, projection, and sorting.
86+
87+
### `--where`
88+
89+
Filter rows. Repeatable — multiple clauses are AND-ed.
90+
91+
```bash
92+
# Exact match
93+
zeroeval judges list --where "name=Quality Check"
94+
95+
# Substring match (case-insensitive)
96+
zeroeval traces list --where "status~completed"
97+
98+
# Set membership
99+
zeroeval spans list --where 'kind in ["llm","tool"]'
100+
```
101+
102+
Supported operators: `=` (exact), `~` (substring), `in` (JSON array).
103+
104+
### `--select`
105+
106+
Project specific fields. Comma-separated, supports dotted paths.
107+
108+
```bash
109+
zeroeval judges list --select "id,name,evaluation_type"
110+
```
111+
112+
### `--order`
113+
114+
Sort results by a field. Defaults to ascending.
115+
116+
```bash
117+
zeroeval traces list --order "created_at:desc"
118+
```
119+
120+
## Monitoring
121+
122+
Monitoring commands require a `--project-id` (resolved automatically after `zeroeval setup`).
123+
124+
### Sessions
125+
126+
```bash
127+
zeroeval sessions list --start-date 2025-01-01 --end-date 2025-02-01 --limit 50
128+
zeroeval sessions get <session_id>
129+
```
130+
131+
| Flag | Description |
132+
|------|-------------|
133+
| `--start-date` | ISO date string lower bound |
134+
| `--end-date` | ISO date string upper bound |
135+
| `--limit` | Max results (default 50) |
136+
| `--offset` | Pagination offset (default 0) |
137+
138+
### Traces
139+
140+
```bash
141+
zeroeval traces list --start-date 2025-01-01 --limit 50
142+
zeroeval traces get <trace_id>
143+
zeroeval traces spans <trace_id> --limit 100
144+
```
145+
146+
`traces spans` returns the spans belonging to a specific trace, useful for debugging individual requests.
147+
148+
### Spans
149+
150+
```bash
151+
zeroeval spans list --start-date 2025-01-01 --limit 50
152+
zeroeval spans get <span_id>
153+
```
154+
155+
## Prompts
156+
157+
### List and inspect
158+
159+
```bash
160+
zeroeval prompts list
161+
zeroeval prompts get <prompt_slug>
162+
zeroeval prompts get <prompt_slug> --version 3
163+
zeroeval prompts get <prompt_slug> --tag production
164+
zeroeval prompts versions <prompt_slug>
165+
zeroeval prompts tags <prompt_slug>
166+
```
167+
168+
### Submit feedback
169+
170+
Provide feedback on a prompt completion for DSPy optimization and prompt tuning:
171+
172+
```bash
173+
zeroeval prompts feedback create \
174+
--prompt-slug customer-support \
175+
--completion-id <uuid> \
176+
--thumbs-up \
177+
--reason "Clear and helpful response"
178+
```
179+
180+
For scored judges, add judge-specific fields:
181+
182+
```bash
183+
zeroeval prompts feedback create \
184+
--prompt-slug customer-support \
185+
--completion-id <uuid> \
186+
--thumbs-down \
187+
--judge-id <judge_uuid> \
188+
--expected-score 3.5 \
189+
--score-direction too_high \
190+
--reason "Score should be lower"
191+
```
192+
6193
<Note>
7-
The CLI is coming soon.
194+
`--thumbs-up` and `--thumbs-down` are mutually exclusive and one is required.
8195
</Note>
9196

10-
We're building a CLI for managing traces, prompts, and judges from the terminal. Useful for CI/CD pipelines, scripted workflows, and quick lookups when you don't want to open the console.
197+
## Judges
198+
199+
### List and inspect
200+
201+
```bash
202+
zeroeval judges list
203+
zeroeval judges get <judge_id>
204+
zeroeval judges criteria <judge_id>
205+
zeroeval judges evaluations <judge_id> --limit 100
206+
zeroeval judges insights <judge_id>
207+
zeroeval judges performance <judge_id>
208+
zeroeval judges calibration <judge_id>
209+
zeroeval judges versions <judge_id>
210+
```
211+
212+
### Filter evaluations
213+
214+
```bash
215+
zeroeval judges evaluations <judge_id> \
216+
--start-date 2025-01-01 \
217+
--end-date 2025-02-01 \
218+
--evaluation-result true \
219+
--feedback-state pending \
220+
--limit 200
221+
```
222+
223+
| Flag | Description |
224+
|------|-------------|
225+
| `--evaluation-result` | `true` or `false` |
226+
| `--feedback-state` | Filter by feedback state |
227+
| `--start-date` / `--end-date` | Date range |
228+
229+
### Create a judge
230+
231+
```bash
232+
zeroeval judges create \
233+
--name "Tone Check" \
234+
--prompt "Evaluate whether the response maintains a professional tone." \
235+
--evaluation-type binary \
236+
--sample-rate 1.0 \
237+
--temperature 0.0
238+
```
239+
240+
Or load the prompt from a file:
241+
242+
```bash
243+
zeroeval judges create \
244+
--name "Quality Scorer" \
245+
--prompt-file judge_prompt.txt \
246+
--evaluation-type scored \
247+
--score-min 0 \
248+
--score-max 10 \
249+
--pass-threshold 7
250+
```
251+
252+
| Flag | Default | Description |
253+
|------|---------|-------------|
254+
| `--name` | required | Judge name |
255+
| `--prompt` || Inline prompt text (mutually exclusive with `--prompt-file`) |
256+
| `--prompt-file` || Path to file containing the prompt |
257+
| `--evaluation-type` | `binary` | `binary` or `scored` |
258+
| `--score-min` | `0.0` | Minimum score (scored only) |
259+
| `--score-max` | `10.0` | Maximum score (scored only) |
260+
| `--pass-threshold` || Pass threshold (scored only) |
261+
| `--sample-rate` | `1.0` | Fraction of spans to evaluate |
262+
| `--backfill` | `100` | Number of existing spans to backfill |
263+
| `--tag` || Tag filter in `key=value1,value2` format. Repeatable. |
264+
| `--tag-match` | `all` | `all` or `any` |
265+
| `--target-prompt-id` || Scope judge to a specific prompt |
266+
| `--temperature` | `0.0` | LLM temperature for judge |
267+
268+
### Submit judge feedback
269+
270+
```bash
271+
zeroeval judges feedback create \
272+
--span-id <span_uuid> \
273+
--thumbs-down \
274+
--reason "Missed safety issue" \
275+
--expected-output "Should flag harmful content"
276+
```
277+
278+
For scored judges with per-criterion feedback:
279+
280+
```bash
281+
zeroeval judges feedback create \
282+
--span-id <span_uuid> \
283+
--thumbs-down \
284+
--expected-score 2.0 \
285+
--score-direction too_high \
286+
--criteria-feedback '{"clarity": {"expected_score": 1.0, "reason": "Confusing response"}}'
287+
```
288+
289+
## Optimization
290+
291+
Start, inspect, and promote prompt or judge optimization runs. All optimization commands require `--project-id`.
292+
293+
### Prompt optimization
294+
295+
```bash
296+
zeroeval optimize prompt list <task_id>
297+
zeroeval optimize prompt get <task_id> <run_id>
298+
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
299+
zeroeval optimize prompt promote <task_id> <run_id> --yes
300+
```
301+
302+
### Judge optimization
303+
304+
```bash
305+
zeroeval optimize judge list <judge_id>
306+
zeroeval optimize judge get <judge_id> <run_id>
307+
zeroeval optimize judge start <judge_id> --optimizer-type dspy_bootstrap
308+
zeroeval optimize judge promote <judge_id> <run_id> --yes
309+
```
310+
311+
| Flag | Default | Description |
312+
|------|---------|-------------|
313+
| `--optimizer-type` | `quick_refine` | `quick_refine`, `dspy_bootstrap`, or `dspy_gepa` |
314+
| `--config` || JSON string of extra optimizer configuration |
315+
| `--yes` | off | Skip the confirmation prompt (also skipped in `--output json` mode) |
316+
317+
## Spec (machine-readable manual)
318+
319+
The `spec` commands dump the CLI's command and parameter contract as JSON or Markdown, useful for agents and toolchains that need to discover available commands programmatically.
320+
321+
```bash
322+
zeroeval spec cli --format json
323+
zeroeval spec command "judges create" --format markdown
324+
```
325+
326+
## CI / automation recipes
327+
328+
### Get the latest traces as JSON
329+
330+
```bash
331+
zeroeval --output json traces list --limit 10 --order "created_at:desc"
332+
```
333+
334+
### Check judge pass rate
335+
336+
```bash
337+
zeroeval --output json judges evaluations <judge_id> \
338+
--evaluation-result true --limit 1000 \
339+
--select "id" | jq length
340+
```
341+
342+
### Promote an optimization run non-interactively
343+
344+
```bash
345+
zeroeval --output json optimize prompt promote <task_id> <run_id> --yes
346+
```
347+
348+
## Related docs
11349

12-
Want to know when it ships? Email [founders@zeroeval.com](mailto:founders@zeroeval.com).
350+
<CardGroup cols={2}>
351+
<Card title="Tracing quickstart" icon="rocket" href="/tracing/quickstart">
352+
Get your first trace in under 5 minutes
353+
</Card>
354+
<Card title="Judges" icon="gavel" href="/judges/introduction">
355+
How calibrated judges evaluate your production traffic
356+
</Card>
357+
<Card title="Prompt setup" icon="wrench" href="/autotune/setup">
358+
Add ze.prompt() to your Python or TypeScript codebase
359+
</Card>
360+
<Card title="Skills" icon="wand-magic-sparkles" href="/integrations/skills">
361+
Let your coding agent handle SDK install and judge setup
362+
</Card>
363+
</CardGroup>

integrations/introduction.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Besides the Python and TypeScript SDKs, there are a few other ways to get ZeroEv
2525
icon="terminal"
2626
href="/integrations/cli"
2727
>
28-
Manage traces, prompts, and judges from your terminal (coming soon)
28+
Manage traces, prompts, and judges from your terminal
2929
</Card>
3030
</CardGroup>
3131

0 commit comments

Comments
 (0)