Skip to content

Commit 13b3e82

Browse files
updates to ab test
1 parent cf2d2e0 commit 13b3e82

File tree

1 file changed

+45
-6
lines changed

1 file changed

+45
-6
lines changed

evaluations/ab-tests.mdx

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,15 @@ description: "Run weighted A/B tests on models, prompts, or any variants in your
55

66
## Overview
77

8-
`ze.choose()` enables A/B testing by making weighted random selections between different variants (models, prompts, parameters, etc.) and automatically tracking which variant was chosen for each execution.
8+
`ze.choose()` enables A/B testing by making weighted random selections between variants (models, prompts, parameters, etc.), timeboxing each experiment, and automatically tracking the chosen variant on the active span/trace/session for downstream analytics.
99

1010
**Key features:**
1111
- Weighted random selection between variants
12+
- Experiment timeboxing via `duration_days`
1213
- Automatic tracking of choices within spans, traces, or sessions
1314
- Consistency caching — same entity always gets the same variant
14-
- Built-in validation of weights and variant keys
15+
- Built-in validation of weights, variant keys, and defaults
16+
- Automatic fallback to a default variant once an experiment completes
1517

1618
## Basic Usage
1719

@@ -22,11 +24,13 @@ ze.init()
2224

2325
# Must be called within a span, trace, or session context
2426
with ze.span("my_operation"):
25-
# Choose between two models with 70/30 split
27+
# Choose between two models with 70/30 split for 14 days
2628
model = ze.choose(
2729
"model_selection",
2830
variants={"fast": "gpt-4o-mini", "powerful": "gpt-4o"},
29-
weights={"fast": 0.7, "powerful": 0.3}
31+
weights={"fast": 0.7, "powerful": 0.3},
32+
duration_days=14,
33+
default_variant="fast" # optional fallback after day 14
3034
)
3135

3236
# Use the selected model
@@ -40,11 +44,37 @@ with ze.span("my_operation"):
4044
| `name` | `str` | Yes | Name of the A/B test (e.g., "model_selection", "prompt_variant") |
4145
| `variants` | `Dict[str, Any]` | Yes | Dictionary mapping variant keys to their values |
4246
| `weights` | `Dict[str, float]` | Yes | Dictionary mapping variant keys to selection probabilities (must sum to ~1.0) |
47+
| `duration_days` | `int` | Yes | Number of days the experiment should run; must be > 0 |
48+
| `default_variant` | `str` | No | Variant key to use automatically once the experiment ends (defaults to the first key if omitted) |
4349

4450
## Returns
4551

4652
Returns the **value** from the selected variant (not the key).
4753

54+
## Experiment Lifecycle & Defaults
55+
56+
- `duration_days` timeboxes the experiment. Once the backend marks it completed, `ze.choose()` automatically serves the `default_variant`.
57+
- If `default_variant` is omitted, the first key in `variants` becomes the fallback.
58+
- When an experiment is still active, the same entity (span/trace/session) receives a cached, consistent variant choice.
59+
60+
## Tracking Signals
61+
62+
Attach success metrics to the same span where `ze.choose()` runs so dashboards can correlate outcomes with variant performance:
63+
64+
```python
65+
with ze.span("recommendation_flow") as span:
66+
model = ze.choose(
67+
"reco_models_v2",
68+
variants={"mini": "gpt-4o-mini", "full": "gpt-4o"},
69+
weights={"mini": 0.6, "full": 0.4},
70+
duration_days=21,
71+
default_variant="mini",
72+
)
73+
74+
score = run_inference(model)
75+
ze.set_signal(span, {"conversion_success": score > 0.75})
76+
```
77+
4878
## Complete Example
4979

5080
```python
@@ -65,19 +95,28 @@ with ze.span("model_ab_test", tags={"feature": "model_comparison"}):
6595
weights={
6696
"mini": 0.7, # 70% traffic
6797
"full": 0.3 # 30% traffic
68-
}
98+
},
99+
duration_days=14,
100+
default_variant="mini"
69101
)
70102

71103
# The selected model is automatically tracked
72104
response = client.chat.completions.create(
73105
model=selected_model,
74106
messages=[{"role": "user", "content": "Hello!"}]
75107
)
108+
109+
# Attach a success signal tied to this span/choice
110+
rating = evaluate_response(response)
111+
ze.set_signal(span, {"response_quality": rating >= 0.7})
76112
```
77113

78114
## Important Notes
79115

80116
- **Context Required**: Must be called within an active `ze.span()`, trace, or session
81-
- **Consistency**: Same entity (span/trace/session) always receives the same variant
117+
- **Consistency**: Same entity (span/trace/session) always receives the same variant while the test runs
82118
- **Weight Validation**: Weights should sum to 1.0 (warns if not within 0.95-1.05)
119+
- **Duration Required**: `duration_days` must be > 0; experiments stop after this window
120+
- **Fallback Behavior**: Once the backend reports the test as completed, `default_variant` is used automatically
121+
- **Signal Analytics**: Use `ze.set_signal()` on the same span to compare variant impact in the dashboard
83122
- **Key Matching**: Variant keys and weight keys must match exactly

0 commit comments

Comments
 (0)