MCP vs CLI

Every AI agent needs tools. The current common approach is to use MCP servers with structured schemas, but they can come at a cost. CLI commands the model already knows are often a cheaper alternative.

The short answer from the benchmarks is that CLI is cheaper. Sometimes dramatically cheaper. But the more useful answer is that the choice depends on whether your agent already knows the tool.

How MCP Costs Tokens

When your agent connects to an MCP server, the full JSON schema for every available tool gets injected into the context window. Tool name, description, parameter definitions, enum values, system instructions. All of it. On every single API call.

Each tool definition costs 550 to 1,400 tokens¹.

MCP Server	Tools	Tokens Injected
GitHub¹	93	~55,000
Jira² (developer-reported estimate)		~17,000
GitHub + Slack + Sentry combined¹		~143,000

Connect three services and you've used 143,000 of a 200,000 token context window before the agent does anything.

How CLI Costs Tokens

CLI isn't free either. The model needs some context about how to use a tool, and where that context comes from determines the cost.

Well-known CLIs (gh, aws, kubectl, docker, git): The model knows these from training data. No schema needed. The cost is essentially just the command and its output.

Custom or obscure CLIs: You provide a skill file (~300-500 tokens), a --help output (~150-250 tokens), or system prompt instructions. That's real cost, but it's an order of magnitude less than MCP schemas.

Scalekit measured this directly. They ran 75 trials on Claude Sonnet 4 against the GitHub Copilot MCP server (43 tools) and tested three configurations: raw CLI, CLI with skill descriptions, and MCP.

Task	CLI	CLI+Skills	MCP
Repo language & license	1,365	4,724	44,026
PR details & review	1,648	2,816	32,279
Repo metadata & install	9,386	12,210	82,835
Merged PRs by contributor	5,010	6,107	33,712
Latest release & deps	8,750	6,860	37,402

All differences statistically significant (p < 0.05).

CLI+Skills costs 2-3x more than raw CLI. That's the honest overhead of providing tool context. But it's still 9-19x cheaper than MCP across every task. Even when you pay for CLI skill descriptions, the gap is massive because you're loading context for the tools you actually use instead of the entire catalog.

Reliability told the same story. CLI: 100% success across 25 runs. MCP: 72%, with 7 ConnectTimeout failures.

The Familiar vs. Unfamiliar Split

This is where it gets interesting, and where the "just use CLI" advice breaks down.

Smithery ran 756 isolated trials across Claude Haiku 4.5 and GPT-5.4, testing GitHub, Linear, and the Singapore Bus API.

MCP won on success rate: 91.7% vs 83.3%. CLI used 2.9x more tokens and took 2.4x longer on successful runs.

Why? Because their test included APIs the models had never seen in training. The Singapore Bus API. Linear's less-documented endpoints. When an agent hits an unfamiliar API with no prior knowledge, the MCP schema is the only map it has. Without it, the agent guesses at parameter names, misunderstands response formats, and retries. Those retries burn more tokens than the schema would have cost upfront.

This is the real decision framework:

Model knows the tool (gh, aws, kubectl, curl): CLI wins. The schema adds cost and zero information. The model already has the interface memorized.

Model doesn't know the tool (internal APIs, niche services, custom integrations): MCP wins. The schema overhead is the cost of teaching the model what the tool does, and it's cheaper than letting the model fail and retry.

The mistake most articles make is treating this as CLI vs MCP. It's actually known vs unknown, and the tool delivery method should follow from that.

What This Costs at Scale

Jannik Reinhard benchmarked Microsoft Intune management tasks. MCP loaded three schemas totaling ~145,000 tokens. CLI did the same work in ~4,150 tokens. 35x difference. With CLI, 95% of the context window was available for reasoning. With MCP, only 64%.

A developer documented replacing 33 MCP tools with 7 bash scripts. Only 6 of the 33 tools were ever used. The idle overhead was 10,000-22,000 tokens per session, actively degrading the agent's reasoning in long conversations by crowding out working memory.

Monthly cost at scale, based on Claude Sonnet pricing ($3/M input tokens), schema overhead only:

Daily Requests	MCP Cost/Month	CLI Cost/Month
100	~$510	~$1.20
1,000	~$5,100	~$12
10,000	~$51,000	~$120

These numbers overstate the gap because they assume zero CLI context cost. With skill files the CLI cost would be roughly 2-3x higher based on Scalekit's data, so closer to ~$4-36/month at the 100-1,000/day range. Still an order of magnitude cheaper.

Mitigations That Actually Work

The industry isn't ignoring this. Several approaches have emerged to keep MCP's benefits without the full schema tax.

Anthropic's Tool Search (docs) defers tool loading until the agent requests it. ~500 tokens upfront vs ~55K+. 85% reduction. Accuracy on Opus 4 improved from 49% to 74% because more context was available for reasoning.

Speakeasy's dynamic toolsets (benchmark) load only matching tools per request. 96-99% reduction on catalogs of 40-400 tools. Trade-off: 2-3x more tool calls and ~50% longer execution time.

mcp2cli (data) wraps MCP servers in CLI shells:

Tools	Turns	Native MCP	mcp2cli	Savings
30	15	54,525	2,309	96%
80	20	193,240	3,871	98%
120	25	362,350	5,181	99%

Perplexity moved away from MCP internally after finding tool definitions consumed 40-50% of their context windows. They built their own Agent API instead.

A Queen's University study found 97.1% of MCP tool descriptions across 103 servers contain quality issues. 56% don't even state their purpose clearly. So the tokens you're spending on schemas are often buying bad documentation.

The Decision

Use CLI when:

The model knows the tool from training (major CLIs, common APIs)
You control the environment and can install tools
You're optimizing for cost or need maximum context for reasoning

Use MCP when:

The model has never seen the API (internal tools, niche services)
You need OAuth, audit trails, or structured access control
The tool count is small (under ~10 tools, the overhead is negligible)

Use dynamic loading when:

You need MCP's structured discovery but can't afford the full catalog
Tool Search, mcp2cli, or gateway-based filtering can cut 85-99% of the overhead

The protocol isn't the problem. Loading 93 tools when you need 3 is the problem. Whether you solve that with CLI, dynamic toolsets, or smarter MCP gateways matters less than whether you solve it at all.

YoAmigo takes the opposite approach from most vibe coding tools: built-in tools and CLIs instead of MCP servers. The goal is faster app creation at lower cost.

Building a website or webapp? YoAmigo is a local-first web app vibe coding platform. Use Claude Code and your own AI tools directly. No token or hosting markup.