Skip to content
Varun Pratap Bhardwaj edited this page Mar 6, 2026 · 2 revisions

AgentAssay Wiki

Welcome to the AgentAssay wiki — the complete guide to token-efficient stochastic testing for AI agents.

Quick Links

What is AgentAssay?

AgentAssay is a formal regression testing framework that delivers statistical guarantees without burning your token budget. It combines behavioral fingerprinting, adaptive budget optimization, and trace-first offline analysis to achieve 5-20x cost reduction at equivalent statistical power.

Core Features

  • Token-Efficient Testing — Pay only for the trials you actually need (40-83% savings)
  • Behavioral Fingerprinting — Detect regressions by comparing agent behavior, not raw outputs
  • Statistical Rigor — Three-valued verdicts (PASS/FAIL/INCONCLUSIVE) with confidence intervals
  • 5D Coverage Model — Tool, path, state, boundary, and model coverage
  • 10 Framework Adapters — LangGraph, CrewAI, AutoGen, OpenAI, smolagents, and more
  • Mutation Testing — 12 operators across 4 categories to evaluate test suite sensitivity
  • Trace-First Analysis — Coverage and contract checking at zero token cost

The Problem AgentAssay Solves

Testing AI agents is expensive. Every test requires LLM API calls and tool executions. A fixed-100-trial strategy costs $20-$200 per run. Multiply by CI frequency, and you're looking at thousands of dollars per month.

Most teams respond by either over-testing (wasting budget), under-testing (missing regressions), or skipping testing entirely.

AgentAssay eliminates this waste through adaptive budgeting, behavioral fingerprinting, and offline analysis — delivering the same statistical confidence at 5-20x lower cost.

Research Foundation

AgentAssay is built on peer-reviewed research:

Installation

pip install agentassay

See the Installation page for framework-specific extras and development setup.

Quick Example

from agentassay.efficiency import AdaptiveBudgetOptimizer

# 1. Run a small calibration (10 trials)
optimizer = AdaptiveBudgetOptimizer(alpha=0.05, beta=0.10)
estimate = optimizer.calibrate(calibration_traces)

# 2. See the savings
print(f"Recommended trials: {estimate.recommended_n}")
print(f"Estimated cost: ${estimate.estimated_cost_usd:.2f}")
print(f"Savings vs fixed-100: {estimate.savings_vs_fixed_100:.0%}")

# 3. Run only what you need
results = runner.run_trials(scenario, n=estimate.recommended_n)

Documentation Structure

Getting Started

Core Concepts

Guides

Reference

Community & Support

License

Apache-2.0 — forever free, never paid.


Part of Qualixar | Author: Varun Pratap Bhardwaj

Clone this wiki locally