-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
Testing local/prototype MCP server builds against published versions is one of the core use cases for mcpbr, but the current workflow has several pain points:
1. Manual volume mount configuration
To test a local MCP server, you have to manually configure Docker volume mounts:
mcp_server:
command: node
args:
- /mnt/my-server/dist/index.js # container path
volumes:
/Users/me/Projects/my-server: /mnt/my-server # host:container mappingThis is error-prone — you have to keep the args path and volumes mount in sync, and it's easy to forget.
2. Silent failure when MCP server can't start
If the MCP server command points to a path that doesn't exist inside the container (e.g., a host-only path without a volume mount), the server silently fails. Claude Code marks it as "status":"failed" in its init, but mcpbr doesn't detect or report this. The agent just runs without MCP tools, completing the benchmark as a vanilla agent — producing misleading comparison results.
This is particularly dangerous for A/B comparisons: one arm may appear to "work" while actually not using MCP at all.
3. No preflight validation
There's no way to verify the MCP server will actually start inside the container before kicking off a full benchmark run (which can take 10+ minutes per instance).
Proposed Solution
local_path shorthand
Add a local_path field to the MCP server config that automatically handles volume mounting:
mcp_server:
command: node
args:
- "{local}/dist/index.js" # {local} expands to the container mount point
local_path: /Users/me/Projects/my-server
env:
MY_API_KEY: ${MY_API_KEY}mcpbr would:
- Auto-mount
local_pathto a deterministic container path (e.g.,/mnt/mcpbr-local) - Replace
{local}inargswith the container mount path - Validate the local path exists on the host before starting
Fail-fast on MCP server failure
After launching Claude Code inside the container, mcpbr should parse the init JSON and check mcp_servers[].status. If any server has "status":"failed", mcpbr should:
- Log a clear error:
ERROR: MCP server 'mcpbr' failed to start inside container - Fail the instance immediately (or at least flag it clearly in results)
- Not count it as a valid baseline/MCP comparison
Preflight validation
Add a --validate flag (or integrate into existing --skip-preflight logic) that:
- Spins up the Docker container
- Attempts to start the MCP server
- Verifies it responds to
tools/list - Reports success/failure before running any benchmark instances
Documentation
Document the volumes config field and the local testing workflow in the README, including the common pitfall of host-vs-container paths.
Context
This came up while trying to A/B test a local prototype MCP server against the published npm version. The prototype arm's MCP server silently failed (host path not available in container), and the agent completed the benchmark without any MCP tools — producing a false comparison that looked valid until we dug into the logs.