github-mcp-sketch

github-mcp-sketch

Proxies the GitHub MCP server, returns compact schema instead of raw JSON, and enables agents to extract specific fields locally via query_response, reducing context size and cost.

Category
Visit Server

README

github-mcp-sketch

A local MCP server that proxies the GitHub MCP and returns the JSON's inferred schema instead of the raw response. Adds a query_response tool the agent uses to pull specific fields it actually needs.

The point: most GitHub API responses are 80%+ metadata the agent doesn't read. list_pull_requests for 30 PRs is ~80KB; the agent usually needs 4–5 fields per PR. Returning a schema first lets the model see the shape and decide what to fetch back.

Architecture

flowchart TD
    Agent["Agent<br/>(Claude Code, Codex, custom SDK app)"]
    Proxy["github-mcp-sketch<br/>• caches raw upstream JSON<br/>• returns schema + cache_id<br/>• serves query_response"]
    Upstream["GitHub MCP<br/>api.githubcopilot.com/mcp"]

    Agent -- stdio MCP --> Proxy
    Proxy -- Streamable HTTPS --> Upstream

The proxy sits between any MCP-speaking agent and GitHub's upstream MCP. Every upstream tool gets forwarded transparently; the proxy intercepts the response, caches the raw JSON in memory, infers a compact schema via json-schema-sketch, and returns cache_id + schema to the agent.

The query loop

The agent doesn't re-fetch the upstream response to extract additional fields. After the first tool call returns the schema, the agent runs a query loop — calling query_response(cache_id, path) as many times as it needs against the locally cached JSON. No upstream round-trip per query.

sequenceDiagram
    participant Agent
    participant Proxy as github-mcp-sketch
    participant Upstream as GitHub MCP (upstream)

    Agent->>Proxy: list_issues(...)
    Proxy->>Upstream: forwards
    Upstream-->>Proxy: raw JSON (~80KB)
    Note over Proxy: caches JSON locally
    Proxy-->>Agent: schema + cache_id (~600B)

    rect rgba(120, 200, 255, 0.15)
    Note over Agent,Proxy: query loop — no upstream calls
    Agent->>Proxy: query_response(cache_id, "[*].title")
    Proxy-->>Agent: extracted titles
    Agent->>Proxy: query_response(cache_id, "[*].state")
    Proxy-->>Agent: extracted states
    Agent->>Proxy: query_response(cache_id, [batch paths])
    Proxy-->>Agent: { results: { path: values, ... } }
    end

This is what makes schema-first MCPs efficient. The expensive part — pulling JSON over the network — happens once per upstream tool call. The agent's iterative drill-down ("now I need titles", "now I need the closed ones", "now I need the author logins") all runs locally against the cache, and each drill-down only puts the fields the agent asked for into the context.

What the numbers look like

Benchmarked across 13 agentic tasks on facebook/react and kubernetes/kubernetes, Anthropic SDK with prompt caching enabled (matching how Claude Code uses the API). N=3 runs per case per side. Sum of medians across all 13 cases:

Metric Baseline (github) Sketch (github-sketch) Δ
Final context size (tokens) 844,136 348,292 −58.7%
API cost (Claude Opus 4.7) $5.59 $2.73 −51.2%
Total tokens billed 1.84M 1.27M −30.8%

Where the win lives — the 13 cases grouped by the shape of response they exercise:

Response shape Cases Baseline ctx Δ context Δ cost
Single-object fetches (one issue, one PR) 2 34,740 +3% +52%
List endpoints with rich metadata (30+ items, lots of URL/reaction noise) 3 222,827 −74% −71%
Comment-heavy threads (KEP review, long React discussion) 3 387,913 −69% −57%
Multi-step agentic workflows (triage, investigation, drill-down) 3 158,329 −40% −27%
File contents and commit history 2 40,327 −2% 0%

The first and last rows are where the proxy doesn't help — but notice they're also the rows where baseline context is already small (~35–40K tokens). A small percentage loss on a small payload barely moves anything in practice. The big absolute numbers are in tiers 2–4 (158K–388K baseline), which is exactly where the proxy's percentage savings translate into 100K+ tokens of headroom in real sessions. Both honest-loss rows were included intentionally so the data isn't cherry-picked.

Full report and reproducible bench: json-schema-sketch-bench.

How it works

Every upstream tool call (list_issues, pull_request_read, etc.) goes through this pipeline:

  1. The proxy forwards the call to https://api.githubcopilot.com/mcp with the agent-supplied PAT.
  2. The raw JSON response is cached in memory under a generated cache_id.
  3. The response shape is inferred via json-schema-sketch into a compact text-form schema.
  4. The proxy returns cache_id: gh-1-abc (pass to query_response) followed by the schema. That's the entire tool result the agent sees.

When the agent wants actual values, it calls query_response:

query_response(cache_id="gh-1-abc", path="items[*].title")

Path syntax supports dot/bracket navigation and [*] wildcards. The agent can pass an array of paths to batch multiple field extractions into one tool call:

query_response(cache_id="gh-1-abc", path=["items[*].title", "items[*].state", "items[*].user.login"])

Cache entries live until the proxy process exits. There's no TTL; the design assumes one agent session per proxy process.

Install

Requires Node 20+.

npm install -g github-mcp-sketch

You'll need a GitHub Personal Access Token (fine-grained, read-only is enough). Create one with content and metadata read on whatever repos you'll use it against.

Configuration

All configuration is via environment variables.

Variable Required Default Purpose
GITHUB_PAT yes Bearer token forwarded as the Authorization header on every upstream request.
UPSTREAM_MCP_URL no https://api.githubcopilot.com/mcp Upstream MCP endpoint. Override for testing or self-hosted GitHub MCPs.
CACHE_SIZE no 50 Max number of cached responses kept in memory before LRU eviction. Each cached entry is one upstream tool result.
METRICS_CSV no (off) If set to a file path, the proxy appends per-tool-call metrics (raw vs sketched response size, tokens, timing) to that CSV. Off by default — set it explicitly to opt in.

How the variables get set depends on how you launch the proxy:

  • From an MCP host (Claude Code, Claude Desktop, Cline, etc.) — the host passes env vars to the spawned process. In Claude Code that's the -e KEY=VALUE flag of claude mcp add (shown below). Values are stored in the host's MCP config (~/.claude.json for Claude Code).
  • Direct shell invocationGITHUB_PAT=ghp_... github-mcp-sketch, or export the variable in your shell rc.
  • Running from a clone of the source — copy .env.example.env, fill it in. Dotenv loads it on startup. (.env is gitignored.)

Wire into Claude Code

Minimal — just the required PAT:

claude mcp add github-sketch \
  -e GITHUB_PAT=<your_pat> \
  -- npx github-mcp-sketch

With optional variables — stack additional -e KEY=VALUE flags before the --:

claude mcp add github-sketch \
  -e GITHUB_PAT=<your_pat> \
  -e CACHE_SIZE=200 \
  -e METRICS_CSV=/tmp/github-sketch-metrics.csv \
  -- npx github-mcp-sketch

All -e values are persisted in ~/.claude.json and passed to the proxy on every spawn.

Verify the server is registered:

claude mcp list | grep github-sketch
# github-sketch: npx github-mcp-sketch - ✓ Connected

Restart Claude Code so the new server is registered.

Changing variables after wire-up

Claude Code doesn't expose an in-place edit for MCP env vars. To change one:

claude mcp remove github-sketch
claude mcp add github-sketch -e GITHUB_PAT=<new_pat> -- npx github-mcp-sketch

Or edit ~/.claude.json directly under the mcpServers.github-sketch.env block.

Wire into Codex

Add the server to your Codex config at ~/.codex/config.toml.

Minimal — just the required PAT:

[mcp_servers.github-sketch]
command = "npx"
args = ["github-mcp-sketch"]

[mcp_servers.github-sketch.env]
GITHUB_PAT = "<your_pat>"

With optional variables:

[mcp_servers.github-sketch]
command = "npx"
args = ["github-mcp-sketch"]

[mcp_servers.github-sketch.env]
GITHUB_PAT = "<your_pat>"
CACHE_SIZE = "200"
METRICS_CSV = "/tmp/github-sketch-metrics.csv"

If you installed the package globally, you can launch the binary directly instead:

[mcp_servers.github-sketch]
command = "github-mcp-sketch"
args = []

[mcp_servers.github-sketch.env]
GITHUB_PAT = "<your_pat>"

Restart Codex after editing ~/.codex/config.toml so the new MCP server is registered. The server should then appear as github-sketch and expose the upstream GitHub tools plus query_response.

Wire into other MCP hosts

The proxy is a standard stdio MCP — anything that speaks MCP can host it. The general pattern:

  • Spawn npx github-mcp-sketch as a stdio process
  • Pass env vars (at minimum GITHUB_PAT) on that spawn

For example, in a custom Anthropic SDK app using @modelcontextprotocol/sdk:

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "npx",
  args: ["github-mcp-sketch"],
  env: {
    ...process.env,
    GITHUB_PAT: process.env.GITHUB_PAT!,
    CACHE_SIZE: "100",  // optional
  },
});
const client = new Client({ name: "my-app", version: "0.1.0" });
await client.connect(transport);

Verify:

claude mcp list | grep github-sketch
# github-sketch: node /.../dist/server.js - ✓ Connected

Restart Claude Code so the new server is registered. The proxy exposes the same tool names as the upstream GitHub MCP (list_issues, pull_request_read, get_file_contents, etc.) plus the added query_response.

Tools

The proxy passes through every tool from the upstream GitHub MCP with identical names, descriptions, and input schemas. The agent calls them exactly as it would call the upstream MCP.

The one added tool:

query_response(cache_id, path, max_items?)

Extract values from a response cached by a previous tool call.

Path syntax:

Path Returns
"" (empty string) The whole cached value
"user.login" A nested scalar
"[0].title" First array item's field
"[*].title" A field from every array item
"items[*].user.login" Nested field across an array

Batch mode — pass an array of paths to get multiple fields in one call:

path=["items[*].number", "items[*].title", "items[*].state"]

Returns { results: { "items[*].number": [...], "items[*].title": [...], ... } }. Per-path errors are reported in that path's entry; other paths still succeed.

max_items — optional cap on items returned per wildcard expansion. Defaults to unlimited; the agent only sets it when it explicitly wants a sample.

When it doesn't help

Looking at the per-tier table above:

  • Single-object fetches of small objects (one issue, one user) — the agent queries back most of the fields anyway, so the schema + query overhead exceeds the per-call wire savings.
  • File contents — raw file text isn't structured, so there's no schema noise to skip. The proxy passes the file through with minor envelope savings only.
  • Tasks where the agent really does need every field — almost never happens for list and search endpoints, but is the failure mode if it does.

Known variance: on t3-06 (a 67-comment review thread on a Kubernetes KEP), one of three sketch runs occasionally consumes ~60K context (vs ~39K median) because the agent issues a doubly-nested wildcard query (review_threads[*].comments[*].body) that materializes most of the original payload. The median is still ~56% better than baseline, but this is a real pattern the agent can fall into. The proxy doesn't currently guard against it.

Reproduce the bench

The benchmark is a separate repo: json-schema-sketch-bench.

git clone https://github.com/markadelnawar/json-schema-sketch-bench
cd json-schema-sketch-bench
npm install
cp .env.example .env  # add ANTHROPIC_API_KEY and GITHUB_PAT
npm run bench   # 13 cases × 2 sides × 3 runs, ~45 min, ~$5-10
npm run report  # generates summary.md

Cases, raw CSVs, methodology details, and the final summary.md from a recent run are all checked in.

Architecture

The proxy is a thin stdio MCP server:

  • Connects to https://api.githubcopilot.com/mcp over Streamable HTTP using the agent-provided GITHUB_PAT.
  • On startup, calls upstream listTools() and registers each tool with its original name and description. Adds query_response to the manifest.
  • On tools/call: if the name is query_response, resolves the path against the cache; otherwise forwards upstream, caches the parsed JSON, infers the schema, and returns the wrapped response.

There's no rate limiting, no retry, no auth refresh. Upstream errors are returned verbatim.

Caveats

  • Stateful in memory. Cache entries live until the proxy process exits. Restart Claude Code = empty cache.
  • One PAT per process. The proxy reads GITHUB_PAT once at startup. To rotate, restart.
  • Schemas drop list-element variance. If items in an array have different shapes, the schema picks a representative one. Rarely a problem for GitHub responses but can be surprising.
  • No write-tool optimization. Tools that perform writes (create_issue, merge_pull_request) work but the response sketching is wasted on small confirmation payloads.

License

MIT. See LICENSE.

Related

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured