Skip to content

Feature: First-class support for testing local MCP server builds #1

@greynewell

Description

@greynewell

Problem

Testing local/prototype MCP server builds against published versions is one of the core use cases for mcpbr, but the current workflow has several pain points:

1. Manual volume mount configuration

To test a local MCP server, you have to manually configure Docker volume mounts:

mcp_server:
  command: node
  args:
    - /mnt/my-server/dist/index.js  # container path
volumes:
  /Users/me/Projects/my-server: /mnt/my-server  # host:container mapping

This is error-prone — you have to keep the args path and volumes mount in sync, and it's easy to forget.

2. Silent failure when MCP server can't start

If the MCP server command points to a path that doesn't exist inside the container (e.g., a host-only path without a volume mount), the server silently fails. Claude Code marks it as "status":"failed" in its init, but mcpbr doesn't detect or report this. The agent just runs without MCP tools, completing the benchmark as a vanilla agent — producing misleading comparison results.

This is particularly dangerous for A/B comparisons: one arm may appear to "work" while actually not using MCP at all.

3. No preflight validation

There's no way to verify the MCP server will actually start inside the container before kicking off a full benchmark run (which can take 10+ minutes per instance).

Proposed Solution

local_path shorthand

Add a local_path field to the MCP server config that automatically handles volume mounting:

mcp_server:
  command: node
  args:
    - "{local}/dist/index.js"  # {local} expands to the container mount point
  local_path: /Users/me/Projects/my-server
  env:
    MY_API_KEY: ${MY_API_KEY}

mcpbr would:

  • Auto-mount local_path to a deterministic container path (e.g., /mnt/mcpbr-local)
  • Replace {local} in args with the container mount path
  • Validate the local path exists on the host before starting

Fail-fast on MCP server failure

After launching Claude Code inside the container, mcpbr should parse the init JSON and check mcp_servers[].status. If any server has "status":"failed", mcpbr should:

  • Log a clear error: ERROR: MCP server 'mcpbr' failed to start inside container
  • Fail the instance immediately (or at least flag it clearly in results)
  • Not count it as a valid baseline/MCP comparison

Preflight validation

Add a --validate flag (or integrate into existing --skip-preflight logic) that:

  1. Spins up the Docker container
  2. Attempts to start the MCP server
  3. Verifies it responds to tools/list
  4. Reports success/failure before running any benchmark instances

Documentation

Document the volumes config field and the local testing workflow in the README, including the common pitfall of host-vs-container paths.

Context

This came up while trying to A/B test a local prototype MCP server against the published npm version. The prototype arm's MCP server silently failed (host path not available in container), and the agent completed the benchmark without any MCP tools — producing a false comparison that looked valid until we dug into the logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions