Skip to content

Commit fac5c29

Browse files
updates
1 parent 2325fb5 commit fac5c29

File tree

13 files changed

+126
-716
lines changed

13 files changed

+126
-716
lines changed

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ mintlify install
3232

3333
| Directory | Purpose |
3434
|-----------|---------|
35-
| `tracing/` | Monitoring & tracing guides, SDK docs (Python + TypeScript), advanced topics (sessions, tagging, signals, OTel) |
35+
| `tracing/` | Monitoring & tracing guides, SDK docs (Python + TypeScript), advanced topics (sessions, tagging, OTel) |
3636
| `autotune/` | Prompt optimization ("Prompts" in nav), setup, model configs |
3737
| `judges/` | AI evaluation judges, setup, multimodal eval, feedback submission |
3838
| `evaluations/` | Evaluations section (currently placeholder) |

autotune/prompts/prompts.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: "Use feedback on production traces to generate and validate better
55

66
<video src="/videos/prompt-optimization.mp4" alt="Prompt optimizations" controls muted playsInline loop preload="metadata" />
77

8-
ZeroEval derives prompt optimization suggestions directly from feedback on your production traces. By capturing preferences and correctness signals, we provide concrete prompt edits you can test and use for your agents.
8+
ZeroEval derives prompt optimization suggestions directly from feedback on your production traces. By capturing preferences and corrections, we provide concrete prompt edits you can test and use for your agents.
99

1010
## Submitting Feedback
1111

feedback/api-reference.mdx

Lines changed: 107 additions & 159 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "API Reference"
3-
description: "REST API for creating, retrieving, updating, and deleting signals"
3+
description: "REST API for submitting and retrieving feedback"
44
---
55

66
Base URL: `https://api.zeroeval.com`
@@ -13,164 +13,6 @@ Authorization: Bearer YOUR_ZEROEVAL_API_KEY
1313

1414
---
1515

16-
## Create Signal
17-
18-
```
19-
POST /signals/
20-
```
21-
22-
Attach a signal to a span, trace, session, or completion.
23-
24-
**Request body:**
25-
26-
| Field | Type | Required | Default | Description |
27-
| ------------- | -------------------------------- | -------- | ----------- | ------------------------------------------- |
28-
| `entity_type` | `string` | Yes || `session`, `trace`, `span`, or `completion` |
29-
| `entity_id` | `string` | Yes || UUID of the target entity |
30-
| `name` | `string` | Yes || Signal name (e.g. `user_satisfaction`) |
31-
| `value` | `string \| bool \| int \| float` | Yes || Signal value |
32-
| `signal_type` | `string` | No | `"boolean"` | `boolean` or `numerical` |
33-
| `project_id` | `string` | No | `null` | Project ID (inferred from auth if omitted) |
34-
35-
```bash
36-
curl -X POST https://api.zeroeval.com/signals/ \
37-
-H "Authorization: Bearer $ZEROEVAL_API_KEY" \
38-
-H "Content-Type: application/json" \
39-
-d '{
40-
"entity_type": "span",
41-
"entity_id": "550e8400-e29b-41d4-a716-446655440000",
42-
"name": "thumbs_up",
43-
"value": true,
44-
"signal_type": "boolean"
45-
}'
46-
```
47-
48-
**Response:** 201
49-
50-
```json
51-
{
52-
"status": "success",
53-
"message": "Signal processed",
54-
"processed_count": 1,
55-
"failed_count": 0,
56-
"errors": null
57-
}
58-
```
59-
60-
---
61-
62-
## Bulk Create Signals
63-
64-
```
65-
POST /signals/bulk
66-
```
67-
68-
Send multiple signals in a single request.
69-
70-
**Request body:**
71-
72-
```json
73-
{
74-
"signals": [
75-
{
76-
"entity_type": "span",
77-
"entity_id": "...",
78-
"name": "accuracy",
79-
"value": 0.95,
80-
"signal_type": "numerical"
81-
},
82-
{
83-
"entity_type": "trace",
84-
"entity_id": "...",
85-
"name": "successful",
86-
"value": true
87-
}
88-
]
89-
}
90-
```
91-
92-
**Response:** 201 (same schema as single create)
93-
94-
---
95-
96-
## Get Entity Signals
97-
98-
```
99-
GET /signals/entity/{entity_type}/{entity_id}
100-
```
101-
102-
Retrieve all signals attached to an entity.
103-
104-
| Path Parameter | Description |
105-
| -------------- | ------------------------------------------- |
106-
| `entity_type` | `session`, `trace`, `span`, or `completion` |
107-
| `entity_id` | UUID of the entity |
108-
109-
**Response:** 200
110-
111-
```json
112-
{
113-
"entity_type": "span",
114-
"entity_id": "550e8400-...",
115-
"signals": [
116-
{ "name": "thumbs_up", "value": true, "type": "boolean" },
117-
{ "name": "accuracy", "value": 0.95, "type": "numerical" }
118-
]
119-
}
120-
```
121-
122-
---
123-
124-
## Update Signal
125-
126-
```
127-
PUT /signals/entity/{entity_type}/{entity_id}/{signal_name}
128-
```
129-
130-
Update the value of an existing signal.
131-
132-
**Request body:**
133-
134-
| Field | Type | Required | Description |
135-
| ------------- | -------------------------------- | -------- | ------------------------ |
136-
| `value` | `string \| bool \| int \| float` | Yes | New signal value |
137-
| `signal_type` | `string` | No | `boolean` or `numerical` |
138-
139-
**Response:** 200
140-
141-
---
142-
143-
## Delete Signal
144-
145-
```
146-
DELETE /signals/entity/{entity_type}/{entity_id}/{signal_name}
147-
```
148-
149-
Remove a signal from an entity.
150-
151-
**Response:** 204 No Content
152-
153-
---
154-
155-
## Get Unique Signal Names
156-
157-
```
158-
GET /signals/unique-names?project_id={project_id}
159-
```
160-
161-
List all signal names that have been used in a project.
162-
163-
**Response:** 200
164-
165-
```json
166-
{
167-
"project_id": "...",
168-
"signal_names": ["thumbs_up", "accuracy", "latency_ms", "user_satisfied"]
169-
}
170-
```
171-
172-
---
173-
17416
## Completion Feedback
17517

17618
```
@@ -221,3 +63,109 @@ curl -X POST https://api.zeroeval.com/v1/prompts/support-bot/completions/550e840
22163
If feedback already exists for the same completion from the same user, it will
22264
be updated with the new values.
22365
</Note>
66+
67+
---
68+
69+
## Unified Entity Feedback
70+
71+
```
72+
GET /projects/{project_id}/feedback/{entity_type}/{entity_id}
73+
```
74+
75+
Retrieve all feedback -- human reviews and judge evaluations -- for a span, trace, or session in a single response.
76+
77+
| Path Parameter | Description |
78+
| -------------- | ---------------------------------- |
79+
| `project_id` | UUID of the project |
80+
| `entity_type` | `span`, `trace`, or `session` |
81+
| `entity_id` | UUID of the entity |
82+
83+
**Response:** 200
84+
85+
```json
86+
{
87+
"entity_type": "span",
88+
"entity_id": "550e8400-...",
89+
"summary": {
90+
"total": 3,
91+
"human_feedback_count": 1,
92+
"judge_evaluation_count": 2
93+
},
94+
"items": [
95+
{
96+
"kind": "human_feedback",
97+
"id": "fb123e45-...",
98+
"span_id": "550e8400-...",
99+
"thumbs_up": true,
100+
"reason": "Clear and helpful",
101+
"created_at": "2025-01-15T10:30:00Z",
102+
"created_by": {
103+
"id": "user-123",
104+
"email": "reviewer@example.com",
105+
"name": "Alice"
106+
},
107+
"source_type": "human"
108+
},
109+
{
110+
"kind": "judge_evaluation",
111+
"id": "je456f78-...",
112+
"span_id": "550e8400-...",
113+
"automation_id": "judge-abc-...",
114+
"judge_name": "Helpfulness",
115+
"evaluation_result": true,
116+
"evaluation_reason": "Response directly answers the question with clear steps.",
117+
"confidence_score": 0.92,
118+
"model_used": "gemini-3-flash-preview",
119+
"evaluation_duration_ms": 1200,
120+
"score": 8.5,
121+
"evaluation_type": "scored",
122+
"score_min": 0,
123+
"score_max": 10,
124+
"pass_threshold": 7.0,
125+
"criteria_scores": {
126+
"clarity": { "score": 9, "reason": "Well-structured response" },
127+
"accuracy": { "score": 8, "reason": "Correct information provided" }
128+
},
129+
"created_at": "2025-01-15T10:31:00Z"
130+
}
131+
]
132+
}
133+
```
134+
135+
### Response fields
136+
137+
**`summary`** -- aggregate counts for fast display:
138+
139+
| Field | Type | Description |
140+
| -------------------------- | ----- | ---------------------------------- |
141+
| `total` | `int` | Total feedback items |
142+
| `human_feedback_count` | `int` | Number of human review items |
143+
| `judge_evaluation_count` | `int` | Number of judge evaluation items |
144+
145+
**`items[]`** -- each item has a `kind` field (`human_feedback` or `judge_evaluation`) that determines which fields are present:
146+
147+
| Field (human_feedback) | Type | Description |
148+
| ----------------------- | -------- | ------------------------------------ |
149+
| `thumbs_up` | `bool` | Positive or negative |
150+
| `reason` | `string` | Reviewer's explanation |
151+
| `expected_output` | `string` | Corrected output (if provided) |
152+
| `created_by` | `object` | User who submitted the feedback |
153+
| `source_type` | `string` | `"human"` or `"judge"` |
154+
155+
| Field (judge_evaluation) | Type | Description |
156+
| -------------------------- | -------- | -------------------------------------------- |
157+
| `automation_id` | `string` | Judge automation UUID |
158+
| `judge_name` | `string` | Display name of the judge |
159+
| `evaluation_result` | `bool` | Whether the output passed |
160+
| `evaluation_reason` | `string` | Judge's reasoning |
161+
| `confidence_score` | `float` | Judge confidence (0-1) |
162+
| `model_used` | `string` | Model used for the evaluation |
163+
| `score` | `float` | Score value (scored evaluations only) |
164+
| `evaluation_type` | `string` | `"binary"` or `"scored"` |
165+
| `score_min` / `score_max` | `float` | Score range (scored evaluations only) |
166+
| `pass_threshold` | `float` | Threshold for pass/fail |
167+
| `criteria_scores` | `object` | Per-criterion scores and reasons |
168+
169+
<Note>
170+
For traces and sessions, feedback is aggregated from all descendant spans.
171+
</Note>

feedback/human-feedback.mdx

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -86,29 +86,6 @@ await ze.sendFeedback({
8686

8787
Expected outputs are used during [prompt optimization](/autotune/prompts/prompts) to generate better prompt variants.
8888

89-
### Using signals for custom metrics
90-
91-
For feedback that doesn't fit the thumbs-up/down model -- star ratings, NPS scores, task completion -- use signals:
92-
93-
<CodeGroup>
94-
95-
```python Python
96-
span = ze.get_current_span()
97-
ze.set_signal(span, {
98-
"star_rating": 4,
99-
"task_completed": True,
100-
"time_on_task_sec": 12.5
101-
})
102-
```
103-
104-
```typescript TypeScript
105-
ze.sendSpanSignal("star_rating", 4);
106-
ze.sendSpanSignal("task_completed", true);
107-
ze.sendSpanSignal("time_on_task_sec", 12.5);
108-
```
109-
110-
</CodeGroup>
111-
11289
## Feedback Links
11390

11491
For collecting feedback from users who don't have a ZeroEval account (e.g. customers, external reviewers), create a feedback link that anyone can use:

feedback/introduction.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ZeroEval supports two kinds of feedback:
1010
- **Human feedback** -- thumbs-up/down, star ratings, corrections, and expected outputs submitted by users or reviewers
1111
- **AI feedback** -- automated evaluations from calibrated judges that score outputs against criteria you define
1212

13-
Both feed into the same system. Feedback attached to completions powers [prompt optimization](/autotune/introduction). Signals attached to spans, traces, and sessions let you filter and monitor quality across your entire system.
13+
Both feed into the same system. Feedback attached to completions powers [prompt optimization](/autotune/introduction). You can also retrieve unified feedback -- combining human reviews and judge evaluations -- for any span, trace, or session via the [Feedback API](/feedback/api-reference#unified-entity-feedback).
1414

1515
## How feedback flows
1616

@@ -25,8 +25,8 @@ Both feed into the same system. Feedback attached to completions powers [prompt
2525
criteria.
2626
</Step>
2727
<Step title="Quality becomes measurable">
28-
Feedback appears on traces and completions in the console. Filter by
29-
thumbs-up rate, judge scores, or custom signals to find patterns.
28+
Feedback appears on spans, traces, and completions in the console. Filter by
29+
thumbs-up rate, judge scores, or tags to find patterns.
3030
</Step>
3131
<Step title="Improvements are driven by data">
3232
Use feedback to optimize prompts, compare models, calibrate judges, and

0 commit comments

Comments
 (0)