Name	Name	Last commit message	Last commit date
parent directory ..
csb-v1-mixed371--haiku45--030326	csb-v1-mixed371--haiku45--030326
README.md	README.md

Name

Last commit message

Last commit date

Results Snapshots

Frozen, auditable benchmark results. Each snapshot is tied to a specific benchmark suite version, model, and configuration(s).

Snapshots

Snapshot	Tasks	Model	Description
`csb-v1-mixed371--haiku45--030326`	371	Haiku 4.5	Initial report: baseline vs Sourcegraph MCP
`csb-v1-mixed371--sonnet46--expanded`	464	Sonnet 4.6	Sonnet expansion
`csb-v1-mixed371--haiku45--mcp-comparison`	747	Haiku 4.5	3-way MCP: Augment + GitHub + Sourcegraph
`csb-v1-mixed371--opus46--mcp-comparison`	23	Opus 4.6	Opus MCP comparison on complex codebases

Structure

Each snapshot contains:

{snapshot_id}/
  SNAPSHOT.json          # Manifest: suite, model, configs, aggregate scores
  browse.html            # Interactive results browser (open in browser)
  summary/
    rewards.json         # Per-task reward scores by config
    aggregate.json       # Mean scores, costs, timing
    timing.json          # Per-task wall clock, agent time, setup time
    costs.json           # Per-task cost and token breakdowns
  traces/                # Symlinks to source run directories (internal)
    {config}/{task_id}/
  export/                # Sanitized copies for public publishing
    traces/{config}/{task_id}/
      result.json        # Trial metadata
      task_metrics.json  # Reward, timing, cost, tokens
      verifier/reward.txt
      agent/instruction.txt

Browsing

Open browse.html in any web browser for an interactive, sortable, filterable view of all task results with baseline vs MCP comparison.

Trajectories

Full agent trajectories (tool calls, LLM responses, per-step timing) are available as compressed archives in the GitHub Release for each snapshot.

Creating Snapshots

python3 scripts/publishing/create_snapshot.py \
  --suite benchmarks/suites/csb-v2-dual264.json \
  --model haiku45 --tag 040101 \
  --scan-dirs runs/staging runs/official/_raw

Verifying & Exporting

python3 scripts/publishing/verify_snapshot.py --all
python3 scripts/publishing/export_snapshot.py runs/snapshots/{snapshot_id}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Results Snapshots

Snapshots

Structure

Browsing

Trajectories

Creating Snapshots

Verifying & Exporting

FilesExpand file tree

snapshots

Directory actions

More options

Directory actions

More options

Latest commit

History

snapshots

Folders and files

parent directory

README.md

Results Snapshots

Snapshots

Structure

Browsing

Trajectories

Creating Snapshots

Verifying & Exporting