Delta-MCP
Token-efficient MCP reimplementation with progressive tool discovery, result handling, and compact wire encoding, reducing token usage by up to 89% on tool definitions.
README
Delta-MCP
Token-efficient MCP reimplementation. Same JSON-RPC 2.0 wire format. Leaner discovery model. 89% fewer tokens on tool definitions, measured.
Why
Standard MCP has two token bloat problems:
Tool-definition bloat. Every tool's full JSON schema loads into context at startup — even tools the model never uses. With 10 tools you're paying 850+ tokens before any work happens. With 50 tools across enterprise servers, thousands.
Tool-result bloat. Large outputs (file reads, search results, API responses) route through LLM context unfiltered. One 50KB file read can destroy your context budget.
Delta-MCP fixes both.
Numbers
| Standard MCP | Delta-MCP | |
|---|---|---|
| 5-tool server init | 910 tokens | 97 tokens |
| 20-tool server init | ~3600 tokens | 378 tokens |
| Definition overhead (1/5 tools used) | 910 tokens upfront | 58 tokens on-demand |
| Tool-selection accuracy (Opus 4) | 49% | 74% |
| Tool-selection accuracy (Opus 4.5) | 79.5% | 88.1% |
| Compact-json wire reduction | — | −18.1% |
Accuracy numbers from Anthropic lazy tool loading research. Token numbers from conformance/scenarios/07-benchmark.test.ts against a 5-tool server with realistic schemas.
How It Works
Progressive disclosure
Delta-MCP replaces eager schema loading with a two-tier model negotiated at initialize:
tools/list → names + ≤60-char descriptions only (~97 tokens for 5 tools)
tools/describe → full schema, on-demand, cached (~30 tokens per tool)
The 60-char description cap is enforced at registration — longer descriptions throw at startup. This is intentional: the schema is the right place for detail, not the discovery index. Counter-intuitively, shorter descriptions improve tool-selection accuracy. More detail increases execution steps by 67% and regresses 16% of cases.
// Standard MCP: model sees all of this before doing anything
{ name: "search", inputSchema: { type: "object", properties: { query: { type: "string", description: "Full-text search query string. Supports boolean operators AND, OR, NOT..." }, limit: { ... }, filters: { type: "object", properties: { dateRange: { enum: [...] }, language: { ... } } } } } }
// Delta-MCP tools/list: model sees this
{ name: "search", description: "Search docs and return top results" }
// Delta-MCP tools/describe (only when model decides to use it):
{ name: "search", inputSchema: { ... full schema ... } }
Result handler
Every tool result passes through the result handler before hitting LLM context:
| Input type | Output |
|---|---|
| String over budget | { truncated: true, preview, totalChars, estimatedTokens, note } |
| Array over page size | { paginated: true, items, page, totalPages, hasMore, note } |
| Object over budget | { _summarized: true, _totalKeys, key: previewValue, ... } |
| Upstream 429 | { type: "rate_limited", retryAfterSeconds, upstream } |
Rate limits become tool results the model can reason about, not exceptions that terminate the agent loop. Pagination params (page, pageSize) flow automatically from tool call args — the model requests subsequent pages without the server needing explicit pagination logic.
Compact wire encoding
Negotiated at initialize, auto-fallback to standard JSON for unaware clients. Both sides switch codecs after the handshake — the initialize response itself is always plain JSON so the client can read it before the switch.
Standard: {"jsonrpc":"2.0","method":"tools/list","result":{"tools":[...]}}
Compact: {"j":"2.0","m":"tools/list","r":{"t":[...]}}
CBOR binary encoding is available over HTTP via the optional cbor-x dependency. Stdio clamps to compact-json because CBOR is binary and cannot be safely newline-delimited.
The HTTP transport decodes requests by Content-Type and encodes responses by the client's Accept header. The MCP-Protocol-Version header is required on all requests except initialize — the client doesn't know the version until the handshake completes.
OAuth 2.1 (resource-server only)
Delta-MCP validates tokens, never issues them. Stateless by design:
Client → POST /mcp
Server → 401 WWW-Authenticate: Bearer resource_metadata="/.well-known/oauth-protected-resource"
Client → GET /.well-known/oauth-protected-resource (RFC 9728 PRM)
Client → discovers AS, gets token via PKCE (mandatory, no implicit flow)
Client → POST /mcp Authorization: Bearer <token>
Server → validates JWT + RFC 8707 audience binding → processes request
Quick Start
npm install @delta-mcp/server @delta-mcp/client
import { DeltaServer } from "@delta-mcp/server";
class MyServer extends DeltaServer {
constructor() {
super({
name: "my-server",
version: "1.0.0",
resultHandler: { maxTokens: 500, paginateAfter: 50 },
});
this.tool({
name: "search",
description: "Search docs and return top results", // ≤60 chars, enforced
inputSchema: {
type: "object",
properties: {
query: { type: "string" },
page: { type: "number" },
},
required: ["query"],
},
});
}
protected async callTool(name: string, args: Record<string, unknown>): Promise<unknown> {
if (name === "search") return performSearch(args.query as string);
throw new Error(`Unknown tool: ${name}`);
}
}
new MyServer().startStdio();
CLI
npx @delta-mcp/cli list node ./server.js # list tools
npx @delta-mcp/cli describe node ./server.js search # full schema
npx @delta-mcp/cli call node ./server.js search '{"query":"x"}' # call tool
npx @delta-mcp/cli bench node ./server.js # benchmark
Architecture
┌──────────────────────────────────────────────────────┐
│ Delta-MCP Client │
│ negotiate capabilities → get index → fetch schema │
│ on demand → cached → call tool │
└─────────────────────┬────────────────────────────────┘
│ JSON-RPC 2.0 (unchanged wire)
┌─────────────────────▼────────────────────────────────┐
│ Delta-MCP Server │
│ │
│ ProgressiveToolRegistry Result Handler │
│ names + 60-char desc truncate / paginate / │
│ schemas on-demand rate-limit → result │
│ │
│ stdio / HTTP transport OAuth 2.1 resource-server│
└──────────────────────────────────────────────────────┘
Packages
| Package | Purpose |
|---|---|
@delta-mcp/core |
Types, transport, progressive disclosure, encoding, auth, result handler |
@delta-mcp/server |
DeltaServer base class — protocol + result handling wired in |
@delta-mcp/client |
DeltaClient with schema cache and capability negotiation |
@delta-mcp/cli |
delta-mcp CLI for inspect, test, benchmark |
Conformance
61 tests across 8 scenarios. Run with:
npm run conformance
| Scenario | Coverage |
|---|---|
| CS-01 | Initialize handshake, capability negotiation, codec negotiation |
| CS-02 | Progressive disclosure: list, describe, cache, 60-char cap |
| CS-03 | tools/call: results, errors, structured output |
| CS-04 | Result handler: truncation, pagination, summarization, rate limits |
| CS-05 | Wire encoding: CBOR negotiation, compact-json roundtrip |
| CS-06 | OAuth 2.1: PRM document, JWT validation, RFC 8707 audience |
| CS-07 | Benchmark: token reduction, latency, overhead targets |
| CS-08 | HTTP transport: version header exemption, codec round-trip |
Full results: docs/benchmarks/results.md
Compatibility
- Baseline: MCP 2025-11-25 — Streamable HTTP + stdio transports
- Node.js: ≥20.0.0
- Wire format: JSON-RPC 2.0 — unchanged, fully interoperable
- Standard MCP clients connecting to a Delta-MCP server get standard MCP behavior automatically
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.