MCP vs CLI
CLI is 9-35x cheaper than MCP for tools the model already knows. But MCP wins when the model doesn't. Compiled benchmarks from 5 independent teams to find the real decision framework.
Every AI agent needs tools. The current common approach is to use MCP servers with structured schemas, but they can come at a cost. CLI commands the model already knows are often a cheaper alternative.
The short answer from the benchmarks is that CLI is cheaper. Sometimes dramatically cheaper. But the more useful answer is that the choice depends on whether your agent already knows the tool.
How MCP Costs Tokens
When your agent connects to an MCP server, the full JSON schema for every available tool gets injected into the context window. Tool name, description, parameter definitions, enum values, system instructions. All of it. On every single API call.
Each tool definition costs 550 to 1,400 tokens1.
| MCP Server | Tools | Tokens Injected |
|---|---|---|
| GitHub1 | 93 | ~55,000 |
| Jira2 (developer-reported estimate) | ~17,000 | |
| GitHub + Slack + Sentry combined1 | ~143,000 |
Connect three services and you've used 143,000 of a 200,000 token context window before the agent does anything.
How CLI Costs Tokens
CLI isn't free either. The model needs some context about how to use a tool, and where that context comes from determines the cost.
Well-known CLIs (gh, aws, kubectl, docker, git): The model knows these from training data. No schema needed. The cost is essentially just the command and its output.
Custom or obscure CLIs: You provide a skill file (~300-500 tokens), a --help output (~150-250 tokens), or system prompt instructions. That's real cost, but it's an order of magnitude less than MCP schemas.
Scalekit measured this directly. They ran 75 trials on Claude Sonnet 4 against the GitHub Copilot MCP server (43 tools) and tested three configurations: raw CLI, CLI with skill descriptions, and MCP.
| Task | CLI | CLI+Skills | MCP |
|---|---|---|---|
| Repo language & license | 1,365 | 4,724 | 44,026 |
| PR details & review | 1,648 | 2,816 | 32,279 |
| Repo metadata & install | 9,386 | 12,210 | 82,835 |
| Merged PRs by contributor | 5,010 | 6,107 | 33,712 |
| Latest release & deps | 8,750 | 6,860 | 37,402 |
All differences statistically significant (p < 0.05).
CLI+Skills costs 2-3x more than raw CLI. That's the honest overhead of providing tool context. But it's still 9-19x cheaper than MCP across every task. Even when you pay for CLI skill descriptions, the gap is massive because you're loading context for the tools you actually use instead of the entire catalog.
Reliability told the same story. CLI: 100% success across 25 runs. MCP: 72%, with 7 ConnectTimeout failures.
The Familiar vs. Unfamiliar Split
This is where it gets interesting, and where the "just use CLI" advice breaks down.
Smithery ran 756 isolated trials across Claude Haiku 4.5 and GPT-5.4, testing GitHub, Linear, and the Singapore Bus API.
MCP won on success rate: 91.7% vs 83.3%. CLI used 2.9x more tokens and took 2.4x longer on successful runs.
Why? Because their test included APIs the models had never seen in training. The Singapore Bus API. Linear's less-documented endpoints. When an agent hits an unfamiliar API with no prior knowledge, the MCP schema is the only map it has. Without it, the agent guesses at parameter names, misunderstands response formats, and retries. Those retries burn more tokens than the schema would have cost upfront.
This is the real decision framework:
Model knows the tool (gh, aws, kubectl, curl): CLI wins. The schema adds cost and zero information. The model already has the interface memorized.
Model doesn't know the tool (internal APIs, niche services, custom integrations): MCP wins. The schema overhead is the cost of teaching the model what the tool does, and it's cheaper than letting the model fail and retry.
The mistake most articles make is treating this as CLI vs MCP. It's actually known vs unknown, and the tool delivery method should follow from that.
What This Costs at Scale
Jannik Reinhard benchmarked Microsoft Intune management tasks. MCP loaded three schemas totaling ~145,000 tokens. CLI did the same work in ~4,150 tokens. 35x difference. With CLI, 95% of the context window was available for reasoning. With MCP, only 64%.
A developer documented replacing 33 MCP tools with 7 bash scripts. Only 6 of the 33 tools were ever used. The idle overhead was 10,000-22,000 tokens per session, actively degrading the agent's reasoning in long conversations by crowding out working memory.
Monthly cost at scale, based on Claude Sonnet pricing ($3/M input tokens), schema overhead only:
| Daily Requests | MCP Cost/Month | CLI Cost/Month |
|---|---|---|
| 100 | ~$510 | ~$1.20 |
| 1,000 | ~$5,100 | ~$12 |
| 10,000 | ~$51,000 | ~$120 |
These numbers overstate the gap because they assume zero CLI context cost. With skill files the CLI cost would be roughly 2-3x higher based on Scalekit's data, so closer to ~$4-36/month at the 100-1,000/day range. Still an order of magnitude cheaper.
Mitigations That Actually Work
The industry isn't ignoring this. Several approaches have emerged to keep MCP's benefits without the full schema tax.
Anthropic's Tool Search (docs) defers tool loading until the agent requests it. ~500 tokens upfront vs ~55K+. 85% reduction. Accuracy on Opus 4 improved from 49% to 74% because more context was available for reasoning.
Speakeasy's dynamic toolsets (benchmark) load only matching tools per request. 96-99% reduction on catalogs of 40-400 tools. Trade-off: 2-3x more tool calls and ~50% longer execution time.
mcp2cli (data) wraps MCP servers in CLI shells:
| Tools | Turns | Native MCP | mcp2cli | Savings |
|---|---|---|---|---|
| 30 | 15 | 54,525 | 2,309 | 96% |
| 80 | 20 | 193,240 | 3,871 | 98% |
| 120 | 25 | 362,350 | 5,181 | 99% |
Perplexity moved away from MCP internally after finding tool definitions consumed 40-50% of their context windows. They built their own Agent API instead.
A Queen's University study found 97.1% of MCP tool descriptions across 103 servers contain quality issues. 56% don't even state their purpose clearly. So the tokens you're spending on schemas are often buying bad documentation.
The Decision
Use CLI when:
- The model knows the tool from training (major CLIs, common APIs)
- You control the environment and can install tools
- You're optimizing for cost or need maximum context for reasoning
Use MCP when:
- The model has never seen the API (internal tools, niche services)
- You need OAuth, audit trails, or structured access control
- The tool count is small (under ~10 tools, the overhead is negligible)
Use dynamic loading when:
- You need MCP's structured discovery but can't afford the full catalog
- Tool Search, mcp2cli, or gateway-based filtering can cut 85-99% of the overhead
The protocol isn't the problem. Loading 93 tools when you need 3 is the problem. Whether you solve that with CLI, dynamic toolsets, or smarter MCP gateways matters less than whether you solve it at all.
YoAmigo takes the opposite approach from most vibe coding tools: built-in tools and CLIs instead of MCP servers. The goal is faster app creation at lower cost.
Building a website or webapp? YoAmigo is a local-first web app vibe coding platform. Use Claude Code and your own AI tools directly. No token or hosting markup.
Footnotes

About the Author
Dominic Cicilio
Independent developer with 10+ years of experience building software. Former early engineer at a security-first startup, now creator of YoAmigo.
There's a better way to build for the web.
See if Yoamigo is right for you.
See If This Is Right for Me