-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the AgentAssay wiki — the complete guide to token-efficient stochastic testing for AI agents.
AgentAssay is a formal regression testing framework that delivers statistical guarantees without burning your token budget. It combines behavioral fingerprinting, adaptive budget optimization, and trace-first offline analysis to achieve 5-20x cost reduction at equivalent statistical power.
- Token-Efficient Testing — Pay only for the trials you actually need (40-83% savings)
- Behavioral Fingerprinting — Detect regressions by comparing agent behavior, not raw outputs
- Statistical Rigor — Three-valued verdicts (PASS/FAIL/INCONCLUSIVE) with confidence intervals
- 5D Coverage Model — Tool, path, state, boundary, and model coverage
- 10 Framework Adapters — LangGraph, CrewAI, AutoGen, OpenAI, smolagents, and more
- Mutation Testing — 12 operators across 4 categories to evaluate test suite sensitivity
- Trace-First Analysis — Coverage and contract checking at zero token cost
Testing AI agents is expensive. Every test requires LLM API calls and tool executions. A fixed-100-trial strategy costs $20-$200 per run. Multiply by CI frequency, and you're looking at thousands of dollars per month.
Most teams respond by either over-testing (wasting budget), under-testing (missing regressions), or skipping testing entirely.
AgentAssay eliminates this waste through adaptive budgeting, behavioral fingerprinting, and offline analysis — delivering the same statistical confidence at 5-20x lower cost.
AgentAssay is built on peer-reviewed research:
- Paper: arXiv:2603.02601 (cs.AI + cs.SE)
- Dataset: Zenodo DOI: 10.5281/zenodo.18842011
- Author: Varun Pratap Bhardwaj (Independent Researcher)
pip install agentassaySee the Installation page for framework-specific extras and development setup.
from agentassay.efficiency import AdaptiveBudgetOptimizer
# 1. Run a small calibration (10 trials)
optimizer = AdaptiveBudgetOptimizer(alpha=0.05, beta=0.10)
estimate = optimizer.calibrate(calibration_traces)
# 2. See the savings
print(f"Recommended trials: {estimate.recommended_n}")
print(f"Estimated cost: ${estimate.estimated_cost_usd:.2f}")
print(f"Savings vs fixed-100: {estimate.savings_vs_fixed_100:.0%}")
# 3. Run only what you need
results = runner.run_trials(scenario, n=estimate.recommended_n)- Token-Efficient Testing ⭐ The differentiator
- Behavioral Fingerprinting
- Statistical Methods
- Coverage Model
- Mutation Testing
- GitHub: github.com/qualixar/agentassay
- PyPI: pypi.org/project/agentassay
- Issues: GitHub Issues
Apache-2.0 — forever free, never paid.
Part of Qualixar | Author: Varun Pratap Bhardwaj
Getting Started
Core Concepts
- Token-Efficient Testing
- Behavioral Fingerprinting
- Statistical Methods
- Coverage Model
- Mutation Testing
Guides
Reference