-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathcodegraph-benchmark.yaml
More file actions
35 lines (33 loc) · 1.07 KB
/
codegraph-benchmark.yaml
File metadata and controls
35 lines (33 loc) · 1.07 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Code Graph Navigation Benchmark
#
# Evaluates whether your MCP server helps agents navigate code knowledge graphs.
# 300 tasks across 10 well-known open-source repos (flask, fastapi, express, etc.)
# 3 difficulty tiers: easy, medium, hard
#
# The MCP agent gets graph exploration tools. The baseline agent gets nothing.
# Delta proves whether your tools actually help agents understand code.
#
# Graph data is pre-loaded into the container in Supermodel MCP cache format.
# The MCP server reads from SUPERMODEL_CACHE_DIR at startup.
#
# Usage:
# mcpbr run -c examples/codegraph-benchmark.yaml
# mcpbr run -c examples/codegraph-benchmark.yaml --filter-difficulty easy
# mcpbr run -c examples/codegraph-benchmark.yaml --filter-category pallets/flask
mcp_server:
name: "supermodel"
command: "npx"
args:
- "-y"
- "@supermodeltools/mcp-server"
- "--no-api-fallback"
- "{workdir}"
env:
SUPERMODEL_CACHE_DIR: "{workdir}/supermodel-cache"
benchmark: codegraph
provider: anthropic
model: sonnet
sample_size: 10
timeout_seconds: 120
max_concurrent: 4
max_iterations: 20