MCP vs CLI Is the Wrong Fight

This repository contains the benchmark suite behind Smithery's blog post, MCP vs CLI Is the Wrong Fight.

The suite compares the same tasks and backends across four surfaces:

raw API priors
raw APIs with machine-readable specs
thin native MCP tools
generic CLI surfaces

It is designed to measure agent experience, not to prove that one interface always wins.

Current Headline Findings

Specs help raw API calling: 51.7% to 73.3% success in api_priors_vs_specs.
Thin native MCP beats direct API use with specs on the same tasks: 55.0% to 91.7% success in specs_vs_native_mcp.
On the same thin surface, MCP beats the described CLI: 91.7% vs 83.3% success in native_mcp_vs_cli.
On the full 826-tool GitHub catalog, explicit CLI search closes part of the gap: 66.7% to 87.5%, while native MCP remains at 100.0%.

Scope

Experiment families:

api_priors_vs_specs
specs_vs_native_mcp
native_mcp_vs_cli
cli_topology
linear_graphql_precision_ablation
github_large_catalog_native_mcp_vs_cli
github_large_catalog_cli_topology
github_large_catalog_search_affordance

Services:

GitHub REST
- 24-operation curated slice
- 826-tool frozen catalog for large-catalog discovery tests
Linear GraphQL
- frozen slice for interface tests
- live read-only rerun for the GraphQL appendix
- public repo ships a redacted task template for the live workspace-specific prompts
Singapore Bus REST
- 8-operation niche API slice

Models:

Claude Code Haiku 4.5
Claude Code Sonnet 4.6 for the large-catalog GitHub experiments
Codex GPT-5.4

The checked-in public artifacts are sanitized:

bench/results/results.jsonl contains only the declared 732-run matrix.
bench/config/tasks/linear.yaml keeps the task structure but redacts workspace-specific strings.

Quick Start

python -m venv .venv
. .venv/bin/activate
pip install -e ".[dev]"

Common commands:

make preflight
make matrix
make smoke
make pilot
make full
make report

Useful docs:

Method: docs/method.md
Benchmark guide: docs/benchmark-guide.md
Current report: docs/report.md

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
bench		bench
docs		docs
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCP vs CLI Is the Wrong Fight

Current Headline Findings

Scope

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MCP vs CLI Is the Wrong Fight

Current Headline Findings

Scope

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages