mcpproxy benchmark
Token cost of loading tools into an agent's context, by routing mode.
Corpus corpus_v1 · 45 tools · encoding cl100k_base
| Mode | Tools in context | Context tokens | Savings vs. baseline |
baseline |
45 |
1730 |
— |
retrieve_tools |
10 |
1431 |
17.3% |
code_execution |
6 |
986 |
43.0% |
Methodology notes
- Token counts use the tiktoken cl100k_base encoding as a reproducible, model-agnostic estimator; exact counts for a pinned model may differ.
- Proxy-mode tools are the full per-mode catalog derived from the live server builders (internal/server.ProxyModeToolDefs), including the shared management tool set (upstream_servers, quarantine_security, search_servers, list_registries).
- Counts tool name + description only; JSON input schemas are excluded uniformly from both the baseline and the proxy modes, so this is a name+description-only metric (not unambiguously conservative). See bench/README.md for the live run with full schemas.
- Corpus is the frozen Spec 065 snapshot (specs/065-evaluation-foundation/datasets/corpus_v1.tools.json); see bench/README.md for the live run with full schemas.