mcp-curate
Converts OpenAPI specs into curated MCP servers with drastically fewer, clearer tools and an eval harness to measure tool selection accuracy.
README
mcp-curate
Turn an OpenAPI spec into a curated MCP server an LLM can actually use — and prove it with an eval.
A naive OpenAPI→MCP generator dumps one tool per endpoint. Point it at GitHub's
API and the model drowns in 1190 tools and picks the wrong one. mcp-curate
consolidates those endpoints into a small set of clear, well-described
meta-tools — and ships an eval harness that measures whether the model picks the
right tool, raw vs curated, on your own spec.
Before / after
| Spec | Raw tools | Curated tools | Reduction |
|---|---|---|---|
| Swagger Petstore | 19 | 3 | 84% |
| Stripe API | 587 | 40 | 93% |
| GitHub REST API | 1190 | 40 | 97% |
$ mcp-curate curate examples/github.json
raw tools: 1190
curated tools: 40 (budget 40)
reduction: 97%
Curated tools (actions consolidated):
- repos: 202 actions [repos]
- actions: 187 actions [actions]
- orgs: 108 actions [orgs]
- issues: 55 actions [issues]
...
Each curated tool exposes an action argument that selects the underlying
operation, so 1190 flat choices become 40 namespaced ones.
Oversized tags get split, not stuffed. When the tool budget has headroom,
a giant tag is broken into focused sub-tools by path instead of one bloated
tool. With more budget, GitHub's 202-operation repos tag splits cleanly:
$ mcp-curate curate examples/github.json --max-tools 120 --max-actions 30
- repos: ... repos_branches, repos_commits, repos_collaborators,
- repos_branches: 36 repos_comments, repos_compare, ... (focused sub-tools)
At a tight budget (the default 40), curation keeps tags whole and clean rather
than forcing unrelated tags together; raise --max-tools to trade tool count
for smaller, more focused tools.
Does curation actually help? (the eval)
mcp-curate eval runs natural-language requests against both the raw and the
curated tool set using your LLM key, and reports how often the model routes to
the correct tool.
$ export ANTHROPIC_API_KEY=...
$ mcp-curate eval examples/stripe.json --cases examples/eval_cases/stripe.yaml
Eval: raw vs curated tool selection
cases: 11 raw tools: 587 curated tools: 40
raw correct-tool selection: <run it>%
curated correct-tool selection: <run it>%
-> improvement: <run it> points
The harness uses your key on your spec, so the numbers aren't
hard-coded — run the command above to reproduce them. Golden sets ship for
Petstore and Stripe (examples/eval_cases/); add your own as a small YAML file.
The eval is deliberately honest. Beyond correct-tool selection it also reports:
- curated tool + action accuracy — so curation can't "win" just by offering fewer, broader tools (it must still route to the right operation);
- argument construction accuracy (raw vs curated) — for cases that declare
expected arguments, whether the model filled the right parameters
(e.g.
petId: 42from "look up pet 42").
Forking this repo? The status badges above point to
tarundattagondi/mcp-curate. Replace that withyour-username/mcp-curatein the three badge URLs at the top so they track your own fork's CI.
Install
git clone https://github.com/tarundattagondi/mcp-curate && cd mcp-curate
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,llm]"
./examples/fetch_specs.sh # petstore is committed; this also grabs GitHub + Stripe
Usage
# Inspect a spec's raw tool count.
mcp-curate parse examples/petstore.json
# See the before/after curation report.
mcp-curate curate examples/github.json --max-tools 40
# Serve the curated MCP server over stdio (bring-your-own auth header).
mcp-curate serve examples/petstore.json --curated \
--header "Authorization: Bearer $TOKEN"
# A/B the tool selection with your LLM key.
mcp-curate eval examples/petstore.json --cases examples/eval_cases/petstore.yaml
Add --llm-descriptions to curate/serve/eval to let the LLM polish the
curated tool names and descriptions (otherwise they're generated deterministically,
with no API key required).
How it works
- Parse — load OpenAPI 3.x (JSON/YAML), resolve
$refwith cycle cutting, flatten each operation into a spec-agnostic model. - Curate — group operations by tag (path-segment fallback), merge the
smallest related groups to fit a tool budget, split any oversized group
into focused sub-tools using leftover headroom, and collapse each group into
one meta-tool with an
actionselector. - Serve — expose either tool set over the MCP stdio transport; tool calls become real HTTP requests against the spec's server URL.
- Eval — force the model to pick a tool for each golden request and score raw vs curated routing.
Security
Runs fully local; nothing leaves your machine except LLM calls (eval, with your
key) and the API calls your served spec makes. An SSRF guard is on by default
— tool calls to loopback/private/link-local hosts are blocked (the cloud-metadata
address 169.254.169.254 always), so a malicious spec can't exfiltrate your auth
headers. Use --allow-local-network to serve a localhost/private API. See
SECURITY.md.
Development
python -m pytest # 35 tests: parser, curation, server roundtrip, eval
Tests are offline: the parser/curation suites need no network, and the eval suite uses a scripted LLM client (no API key).
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.