MCP Servers

TokenPilot

Automatic token optimization for Claude Code that extends session duration by reducing wasted tokens across effort tuning, file reads, tool cost, context health, and task classification.

README

TokenPilot

Automatic token optimization for Claude Code. Extends session duration by reducing wasted tokens across every dimension — effort tuning, redundant file reads, tool cost routing, context health tracking, and smart task classification.

Built as a Claude Code hooks + MCP server system. Works alongside RTK for shell compression and MCP Compressor for schema reduction.

How It Works

TokenPilot runs as four layers:

Hooks — intercept Claude Code lifecycle events (session start, prompt submit, pre/post tool use)
MCP Server — exposes tools for real-time control and monitoring
SQLite Database — persists session state across hook subprocess calls with WAL mode + serializable isolation
Tool Registry — maps known tools to estimated costs and cheaper alternatives

┌── Claude Code Hooks ──────────────────────────────────────┐
│                                                            │
│  SessionStart        → init session, inject hints          │
│  UserPromptSubmit    → classify task → suggest effort      │
│  PreToolUse (Read)   → dedup file reads + suggest cheaper  │
│  PostToolUse (all)   → track real tool output token costs  │
│                                                            │
└────────────────┬───────────────────────────────────────────┘
                 │
    ┌────────────▼──────────────────┐
    │   TokenPilot MCP Server       │
    │                               │
    │   set_level(1-10)             │  Aggressiveness dial
    │   get_stats()                 │  Live session metrics
    │   get_savings()               │  Token savings report
    │   get_context_health()        │  Context window status
    │   get_tool_report()           │  Most expensive tools
    │   get_file_report(path)       │  File read history
    │   explain_classification(p)   │  Debug classifier
    │   reset_file_tracking()       │  Clear dedup cache
    │                               │
    │   SQLite + Tool Registry      │  Persistent state
    └───────────────────────────────┘

Aggressiveness Scale

Default: 4 (conservative-balanced). Adjustable 1-10 at any time via /tp level N.

Level	Effort Suggestion	File Read Dedup	Thinking Cap	Compact Reminder
1-2	Never	Notify only	No cap	90% context
3-4	Trivial tasks only	Warn on redundant	No cap	75% context
5-6	All tasks	Warn + suggest alternatives	Adaptive (10-30K)	65% context
7-8	Strong recommendation	Block re-reads	Adaptive (6-18K)	55% context
9-10	Enforce	Block + auto-range	Adaptive (4-12K)	45% context

Thinking caps are adaptive — they scale based on task complexity and classifier confidence. A "trivial" task gets a tighter cap than a "complex" task. If the classifier is uncertain (confidence < 0.5), no cap is applied.

Task Classifier (v2)

Lightweight regex + keyword classifier with negation detection, adjacency scoring, and quoted-code filtering. No LLM calls, <10ms execution.

Category	Effort	Model Hint	Example
`trivial`	low	haiku	"fix typo in README"
`research`	medium	sonnet	"explain how the API routes work"
`standard`	medium	sonnet	"add a loading spinner"
`complex`	high	opus	"refactor auth across all microservices"

v2 improvements:

Negation detection: "don't refactor" no longer matches the refactor pattern
Quoted-code filtering: backtick-wrapped code is stripped before classification
Adjacency scoring: "add auth to 12 routes" correctly detects complexity from keyword pairs
Confidence calibration: very short prompts get low confidence (0.3) instead of false high confidence

Debug any classification with /tp explain <prompt>.

Tool Cost Registry

TokenPilot knows the estimated token cost of common tools and suggests cheaper alternatives:

Tool	Avg Tokens	Alternative	Alt Tokens	Savings
Read	~2000	jCodeMunch symbol lookup	~200	90%
WebSearch	~2000	Context7 docs query	~800	60%
WebFetch	~3000	Context7 docs query	~800	73%

At level 5+, TokenPilot suggests alternatives when a cheaper tool could do the job.

Installation

Prerequisites

Python 3.10+
Claude Code CLI
FastMCP (pip3 install fastmcp)

Setup

Clone to your MCPs directory:

git clone https://github.com/rish-e/tokenpilot.git ~/MCPs/tokenpilot

Install dependencies:

pip3 install -r ~/MCPs/tokenpilot/requirements.txt

Add hooks and MCP server to ~/.claude/settings.json:

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          { "type": "command", "command": "~/MCPs/tokenpilot/hooks/session_start.sh", "timeout": 5 }
        ]
      }
    ],
    "UserPromptSubmit": [
      {
        "hooks": [
          { "type": "command", "command": "~/MCPs/tokenpilot/hooks/classify.sh", "timeout": 5 }
        ]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Read",
        "hooks": [
          { "type": "command", "command": "~/MCPs/tokenpilot/hooks/check_read.sh", "timeout": 5 }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": ".*",
        "hooks": [
          { "type": "command", "command": "~/MCPs/tokenpilot/hooks/post_tool.sh", "timeout": 3 }
        ]
      }
    ]
  },
  "mcpServers": {
    "tokenpilot": {
      "command": "python3",
      "args": ["~/MCPs/tokenpilot/server.py"],
      "env": { "PYTHONPATH": "~/MCPs/tokenpilot" }
    }
  }
}

Install the /tp slash command:

cp ~/MCPs/tokenpilot/commands/tp.md ~/.claude/commands/tp.md

Restart Claude Code.

Optional: RTK for Shell Compression

brew install rtk-ai/tap/rtk
rtk init -g

Adds 60-90% token savings on shell output (build logs, test output, git).

Usage

TokenPilot runs automatically after installation. You'll see [TokenPilot] messages when it detects optimization opportunities.

Slash Commands

5 commands. That's it.

Command	What it does
`/tp <1-10>`	Set aggressiveness level
`/tp on` / `off`	Enable/disable TokenPilot
`/tp stats`	Full session dashboard
`/tp note <text>`	Add a note to the Project Brain
`/tp explain <prompt>`	Debug why a prompt was classified

MCP Tools

For power users, all tools are callable directly:

set_level toggle get_stats get_savings get_context_health get_tool_report get_file_report explain_classification add_note reset_file_tracking

CLI (for testing)

cd ~/MCPs/tokenpilot

python3 server.py init 4                    # Initialize session
python3 server.py classify "fix typo"       # Classify prompt
python3 server.py classify_debug "fix typo" # Debug classification
python3 server.py check_file "/src/app.py"  # Check file dedup
python3 server.py context_health            # Context window status

Project Brain

TokenPilot auto-maintains a tpcontext.md file in each project root. This is persistent memory across Claude Code sessions — when you start a new chat, Claude immediately knows where you left off.

Fully automatic:

First install — bootstraps from git history (commits, active files, branch)
Every session start — auto-saves previous session, loads brain into context
No manual save needed — it just works

What it captures:

Files modified (from git diff)
Recent commits
User notes (via /tp note "...")
Session stats (duration, prompt count)
Most active files

Add context for future sessions:

/tp note "switched to GraphQL — don't touch REST endpoints"

Stays under 2K tokens. Keeps last 5 sessions, older ones rotate out.

Smart Warnings

TokenPilot automatically detects and warns about token-wasting patterns:

Rapid-fire prompts — 3+ short messages in a row triggers a "batch your questions" suggestion
Session age — every 15 prompts, suggests /compact or starting fresh
Peak hours — warns once per session during 5-11am PT weekdays (Anthropic burns limits faster during peak)

All warnings appear as [TokenPilot] messages and respect the on/off toggle.

File Structure

tokenpilot/
├── server.py            # FastMCP server + CLI entry point
├── classifier.py        # Task classifier (v2: negation, adjacency, debug)
├── config.py            # Aggressiveness scale + adaptive thinking caps
├── db.py                # SQLite persistence (WAL, indexed, serializable)
├── brain.py             # Project Brain — auto-generated tpcontext.md
├── tool_registry.py     # Tool cost estimates + cheaper alternatives
├── tracker.py           # In-memory tracker (used by MCP server process)
├── requirements.txt
├── commands/
│   └── tp.md            # /tp slash command (copy to ~/.claude/commands/)
├── hooks/
│   ├── session_start.sh # SessionStart — init + load brain
│   ├── classify.sh      # UserPromptSubmit — classify + rapid-fire + peak hours + session age
│   ├── check_read.sh    # PreToolUse (Read) — dedup + tool routing
│   └── post_tool.sh     # PostToolUse — real token tracking
└── templates/
    └── claudeignore-default

How Token Savings Stack

Layer	What	Savings
TokenPilot classifier	Right effort level per task	Thinking token reduction
TokenPilot file dedup	Skip redundant file reads	~2K tokens per blocked read
TokenPilot tool routing	Suggest cheaper tool alternatives	60-90% per substitution
TokenPilot PostToolUse	Track actual token costs (visibility)	Measurement enables optimization
TokenPilot smart warnings	Batch prompts, session age, peak hours	Prevents context blowup
TokenPilot Project Brain	Resume sessions without re-explaining context	3-5 messages saved per session start
RTK	Compress shell output	60-90% on Bash results
MCP Compressor	Compress MCP tool schemas	70-97% per wrapped server
.claudeignore	Exclude build artifacts from search	30-40% on exploration

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured