ablift is an agent-friendly CLI for A/B/n decisions.
Early release (alpha): this project is work in progress and not production-hardened yet.
It supports:
- Bayesian conversion-rate decisions (Beta-Binomial)
- Bayesian ARPU probability-to-win from aggregate revenue stats
- Frequentist sequential decisions (O'Brien-Fleming alpha spending)
- Multi-variant (
A/B/n) comparisons against one control - Guardrail checks (latency, bounce rate, error rate, etc.)
- SRM detection (sample ratio mismatch) for data quality
- Structured JSON output for agents and automations
- Markdown report generation for human review
- CSV/XLSX ingestion with auto-detected column layout
uv tool install .This installs ablift as a normal CLI tool and exposes the ablift command on your PATH.
To work from the repo during development:
uv sync --group testThen run commands through the repo-managed environment:
uv run ablift --helpIf you want an editable install in an existing environment:
uv pip install -e .Show available commands:
ablift --help
ablift analyze --helpGenerate a JSON input template:
ablift example-input > input.jsonRun a Bayesian analysis from JSON:
ablift analyze \
--input input.json \
--output output.json \
--report report.mdRun the same workflow from a CSV file with inferred columns:
ablift analyze \
--input examples/conversion_multivariant.csv \
--output output.json \
--report report.mdRun with an explicit Bayesian decision policy:
ablift analyze --input bayesian_input.csvIf you are developing from the repo, use the same commands prefixed with uv run:
uv run ablift analyze \
--input input.json \
--output output.json \
--report report.mdablift analyze: analyze.json,.csv,.xlsx, or.xlsminputablift duration: estimate runtime needed for a testablift doctor: verify the environment and required dependenciesablift example-input: print a starter JSON payload
Use primary_metric: "conversion_rate" when each row represents:
- a denominator count
- a success count
- successes that cannot exceed the denominator
Examples that fit this model:
- sessions and click-through sessions
- users and purchasers
- emails delivered and openers
- visits and visits with at least one signup
Use primary_metric: "arpu" when you have aggregate revenue statistics per variant:
visitorsconversionsrevenue_sumrevenue_sum_squares
If your metric can happen multiple times per unit and the count can exceed the denominator, it is not a fit for the current conversion_rate model. For example:
- visits and total clicks across all visits
- users and page views
- sessions and multiple add-to-cart events
In those cases, either:
- redefine the outcome as binary per unit, such as "visit with at least one click"
- or use a different metric/model outside the current scope of this tool
The input contract is strict about semantics, not source column names.
ablift separates:
- inference settings: priors, posterior sample count, random seed
- decision policy: thresholds used to turn estimates into actions
For Bayesian runs, recommendations are optional. If you do not provide a decision policy, ablift reports the posterior estimates but leaves recommendation as null.
For CSV/XLSX input, the JSON output also includes analysis_settings.input_interpretation so agents can see the detected input shape (row_level or aggregated) and whether ARPU results are approximate.
For analysis inputs:
- at least 2 variants are required
- exactly 1 variant must have
is_control: true methodmust bebayesianorfrequentist_sequentialprimary_metricmust beconversion_rateorarpuvisitorsmust be a positive integerconversionsmust be a non-negative integerconversionscannot exceedvisitors- for
arpu, each variant must also includerevenue_sumandrevenue_sum_squares
Bayesian defaults:
- posterior sample count defaults to
50000 - random seed defaults to
7 - conversion-rate prior is
Beta(1, 1) - ARPU prior is a weak
Normal-Inverse-Gammaprior withmu0=0,kappa0=1e-6,alpha0=1,beta0=1
Bayesian recommendations require an explicit decision policy, either in the input payload, via [tool.ablift] in pyproject.toml, or via CLI flags such as:
--enable-recommendation--prob-threshold--max-expected-loss
Internal field meanings:
name: variant labelvisitors: denominator units exposed to the experimentconversions: success units for the primary outcomeis_control: whether the row is the control variant
Top-level fields:
experiment_name(str)method("bayesian"or"frequentist_sequential")primary_metric("conversion_rate"or"arpu")alpha(float, default0.05)look_indexandmax_looks(ints, sequential mode)information_fraction(optional float in(0, 1], overrides look/max_looks)variants(list): exactly one row must include"is_control": trueguardrails(optional list)decision_policy(optional):enabled(bool)bayes_prob_beats_controlmax_expected_loss
decision_thresholds(legacy alias, still supported):bayes_prob_beats_control(default0.95)max_expected_loss(default0.001)
samples(default50000, Bayesian mode)random_seed(default7)
Variant row:
{
"name": "control",
"visitors": 100000,
"conversions": 4000,
"is_control": true
}For primary_metric: "arpu", each variant also needs:
revenue_sumrevenue_sum_squares
Minimal Bayesian conversion-rate example:
{
"experiment_name": "homepage_cta",
"method": "bayesian",
"primary_metric": "conversion_rate",
"variants": [
{"name": "control", "visitors": 50000, "conversions": 2000, "is_control": true},
{"name": "v1", "visitors": 50000, "conversions": 2080, "is_control": false}
]
}Bayesian ARPU probability-to-win example:
{
"experiment_name": "pricing_page",
"method": "bayesian",
"primary_metric": "arpu",
"variants": [
{"name": "control", "visitors": 10000, "conversions": 550, "revenue_sum": 22000, "revenue_sum_squares": 150000, "is_control": true},
{"name": "v1", "visitors": 10000, "conversions": 570, "revenue_sum": 23500, "revenue_sum_squares": 170000, "is_control": false}
]
}ablift analyze reads CSV and XLSX files using positional columns — column names are ignored, column order determines the role.
| Col 1 | Col 2 | Col 3 | Col 4 (optional) |
|---|---|---|---|
| variant name | visitors | conversions or revenue_sum | is_control (boolean) |
If col 3 values are integers ≤ col 2 → conversion rate analysis. If col 3 values are floats or exceed col 2 → ARPU analysis (approximate, flagged in output).
| Col 1 | Col 2 | Col 3 (optional) |
|---|---|---|
| variant name | outcome | is_control (boolean) |
If col 2 values are 0/1 → conversion rate analysis (tool counts visitors and conversions per variant). If col 2 values are continuous → ARPU analysis (tool computes revenue_sum and revenue_sum_squares per variant).
The first variant (first row or first group) is treated as control by default. To override, include a boolean column (values like true/false, 1/0, yes/no) — the tool detects it automatically.
Use --sheet to select a worksheet in .xlsx/.xlsm files.
Use [tool.ablift] in pyproject.toml when you want one decision policy and one set of Bayesian inference settings reused across analyses in the project.
Example:
[tool.ablift]
method = "bayesian"
primary_metric = "conversion_rate"
samples = 50000
random_seed = 7
[tool.ablift.decision_policy]
enabled = true
bayes_prob_beats_control = 0.95
max_expected_loss = 0.001Precedence is:
- CLI flags
- input file
pyproject.toml[tool.ablift]- built-in defaults
Bundled examples:
{
"experiment_name": "homepage_cta",
"method": "bayesian",
"primary_metric": "conversion_rate",
"variants": [
{"name": "control", "visitors": 50000, "conversions": 2000, "is_control": true},
{"name": "v1", "visitors": 50000, "conversions": 2080, "is_control": false},
{"name": "v2", "visitors": 50000, "conversions": 2140, "is_control": false}
]
}Sequential ARPU at an early look:
{
"experiment_name": "checkout_flow",
"method": "frequentist_sequential",
"primary_metric": "arpu",
"alpha": 0.05,
"look_index": 3,
"max_looks": 10,
"variants": [
{"name": "control", "visitors": 12000, "conversions": 610, "revenue_sum": 21000, "revenue_sum_squares": 150000, "is_control": true},
{"name": "v1", "visitors": 12000, "conversions": 640, "revenue_sum": 22400, "revenue_sum_squares": 167000, "is_control": false}
]
}Run the CLI from the repo:
uv run ablift --helpRun the test suite:
uv run pytestRun all bundled demos:
make demoCheck environment readiness:
uv run ablift doctor
uv run ablift doctor --json
uv run ablift doctor --strictEstimate duration from assumptions:
uv run ablift duration \
--method frequentist \
--baseline-rate 0.04 \
--relative-mde 0.05 \
--daily-traffic 50000 \
--n-variants 3 \
--max-looks 10Estimate Bayesian duration:
uv run ablift duration \
--method bayesian \
--baseline-rate 0.04 \
--relative-mde 0.05 \
--daily-traffic 50000 \
--n-variants 3 \
--max-days 60Estimate duration from CSV/XLSX (column names must match field names: baseline_rate, relative_mde, daily_traffic, etc.):
uv run ablift duration \
--input examples/duration_inputs.xlsx \
--sheet Sheet1- Run
ablift analyze --input ...with a JSON, CSV, or XLSX file. - For CSV/XLSX, format data with positional columns (see CSV/XLSX input section).
- Read
recommendation.action,decision_confidence, andrisk_flags. - If
action=continue_collecting_data, schedule the next look. - If
action=investigate_data_quality, resolve SRM or tracking issues before deciding to ship. - For planning questions, run
ablift duration.
Common errors and fixes:
Exactly one variant must have is_control=true: mark one and only one control row.conversions cannot exceed visitors: the metric is a repeated-event count rather than a binary outcome — redefine as binary per unit.primary_metric must be 'conversion_rate' or 'arpu': fix the input payload value.
recommendation contains:
action(ship_*,continue_collecting_data,do_not_ship,investigate_data_quality,stop_and_rollback)rationaledecision_confidence(0 to 1)next_best_actionrisk_flags(e.g.srm_detected,guardrail_failure)
If no explicit Bayesian decision policy is provided, recommendation is null and analysis_settings.decision_policy shows that no automated decision policy was applied.
- This tool currently uses aggregate statistics for ARPU.
- For production, add metric-specific robust models, stronger QA checks, and regression tests.