TokenPilot
Automatic token optimization for Claude Code that extends session duration by reducing wasted tokens across effort tuning, file reads, tool cost, context health, and task classification.
README
TokenPilot
Automatic token optimization for Claude Code. Extends session duration by reducing wasted tokens across every dimension — effort tuning, redundant file reads, tool cost routing, context health tracking, and smart task classification.
Built as a Claude Code hooks + MCP server system. Works alongside RTK for shell compression and MCP Compressor for schema reduction.
How It Works
TokenPilot runs as four layers:
- Hooks — intercept Claude Code lifecycle events (session start, prompt submit, pre/post tool use)
- MCP Server — exposes tools for real-time control and monitoring
- SQLite Database — persists session state across hook subprocess calls with WAL mode + serializable isolation
- Tool Registry — maps known tools to estimated costs and cheaper alternatives
┌── Claude Code Hooks ──────────────────────────────────────┐
│ │
│ SessionStart → init session, inject hints │
│ UserPromptSubmit → classify task → suggest effort │
│ PreToolUse (Read) → dedup file reads + suggest cheaper │
│ PostToolUse (all) → track real tool output token costs │
│ │
└────────────────┬───────────────────────────────────────────┘
│
┌────────────▼──────────────────┐
│ TokenPilot MCP Server │
│ │
│ set_level(1-10) │ Aggressiveness dial
│ get_stats() │ Live session metrics
│ get_savings() │ Token savings report
│ get_context_health() │ Context window status
│ get_tool_report() │ Most expensive tools
│ get_file_report(path) │ File read history
│ explain_classification(p) │ Debug classifier
│ reset_file_tracking() │ Clear dedup cache
│ │
│ SQLite + Tool Registry │ Persistent state
└───────────────────────────────┘
Aggressiveness Scale
Default: 4 (conservative-balanced). Adjustable 1-10 at any time via /tp level N.
| Level | Effort Suggestion | File Read Dedup | Thinking Cap | Compact Reminder |
|---|---|---|---|---|
| 1-2 | Never | Notify only | No cap | 90% context |
| 3-4 | Trivial tasks only | Warn on redundant | No cap | 75% context |
| 5-6 | All tasks | Warn + suggest alternatives | Adaptive (10-30K) | 65% context |
| 7-8 | Strong recommendation | Block re-reads | Adaptive (6-18K) | 55% context |
| 9-10 | Enforce | Block + auto-range | Adaptive (4-12K) | 45% context |
Thinking caps are adaptive — they scale based on task complexity and classifier confidence. A "trivial" task gets a tighter cap than a "complex" task. If the classifier is uncertain (confidence < 0.5), no cap is applied.
Task Classifier (v2)
Lightweight regex + keyword classifier with negation detection, adjacency scoring, and quoted-code filtering. No LLM calls, <10ms execution.
| Category | Effort | Model Hint | Example |
|---|---|---|---|
trivial |
low | haiku | "fix typo in README" |
research |
medium | sonnet | "explain how the API routes work" |
standard |
medium | sonnet | "add a loading spinner" |
complex |
high | opus | "refactor auth across all microservices" |
v2 improvements:
- Negation detection: "don't refactor" no longer matches the refactor pattern
- Quoted-code filtering: backtick-wrapped code is stripped before classification
- Adjacency scoring: "add auth to 12 routes" correctly detects complexity from keyword pairs
- Confidence calibration: very short prompts get low confidence (0.3) instead of false high confidence
Debug any classification with /tp explain <prompt>.
Tool Cost Registry
TokenPilot knows the estimated token cost of common tools and suggests cheaper alternatives:
| Tool | Avg Tokens | Alternative | Alt Tokens | Savings |
|---|---|---|---|---|
| Read | ~2000 | jCodeMunch symbol lookup | ~200 | 90% |
| WebSearch | ~2000 | Context7 docs query | ~800 | 60% |
| WebFetch | ~3000 | Context7 docs query | ~800 | 73% |
At level 5+, TokenPilot suggests alternatives when a cheaper tool could do the job.
Installation
Prerequisites
- Python 3.10+
- Claude Code CLI
- FastMCP (
pip3 install fastmcp)
Setup
- Clone to your MCPs directory:
git clone https://github.com/rish-e/tokenpilot.git ~/MCPs/tokenpilot
- Install dependencies:
pip3 install -r ~/MCPs/tokenpilot/requirements.txt
- Add hooks and MCP server to
~/.claude/settings.json:
{
"hooks": {
"SessionStart": [
{
"hooks": [
{ "type": "command", "command": "~/MCPs/tokenpilot/hooks/session_start.sh", "timeout": 5 }
]
}
],
"UserPromptSubmit": [
{
"hooks": [
{ "type": "command", "command": "~/MCPs/tokenpilot/hooks/classify.sh", "timeout": 5 }
]
}
],
"PreToolUse": [
{
"matcher": "Read",
"hooks": [
{ "type": "command", "command": "~/MCPs/tokenpilot/hooks/check_read.sh", "timeout": 5 }
]
}
],
"PostToolUse": [
{
"matcher": ".*",
"hooks": [
{ "type": "command", "command": "~/MCPs/tokenpilot/hooks/post_tool.sh", "timeout": 3 }
]
}
]
},
"mcpServers": {
"tokenpilot": {
"command": "python3",
"args": ["~/MCPs/tokenpilot/server.py"],
"env": { "PYTHONPATH": "~/MCPs/tokenpilot" }
}
}
}
- Install the
/tpslash command:
cp ~/MCPs/tokenpilot/commands/tp.md ~/.claude/commands/tp.md
- Restart Claude Code.
Optional: RTK for Shell Compression
brew install rtk-ai/tap/rtk
rtk init -g
Adds 60-90% token savings on shell output (build logs, test output, git).
Usage
TokenPilot runs automatically after installation. You'll see [TokenPilot] messages when it detects optimization opportunities.
Slash Commands
5 commands. That's it.
| Command | What it does |
|---|---|
/tp <1-10> |
Set aggressiveness level |
/tp on / off |
Enable/disable TokenPilot |
/tp stats |
Full session dashboard |
/tp note <text> |
Add a note to the Project Brain |
/tp explain <prompt> |
Debug why a prompt was classified |
MCP Tools
For power users, all tools are callable directly:
set_level toggle get_stats get_savings get_context_health get_tool_report get_file_report explain_classification add_note reset_file_tracking
CLI (for testing)
cd ~/MCPs/tokenpilot
python3 server.py init 4 # Initialize session
python3 server.py classify "fix typo" # Classify prompt
python3 server.py classify_debug "fix typo" # Debug classification
python3 server.py check_file "/src/app.py" # Check file dedup
python3 server.py context_health # Context window status
Project Brain
TokenPilot auto-maintains a tpcontext.md file in each project root. This is persistent memory across Claude Code sessions — when you start a new chat, Claude immediately knows where you left off.
Fully automatic:
- First install — bootstraps from git history (commits, active files, branch)
- Every session start — auto-saves previous session, loads brain into context
- No manual save needed — it just works
What it captures:
- Files modified (from git diff)
- Recent commits
- User notes (via
/tp note "...") - Session stats (duration, prompt count)
- Most active files
Add context for future sessions:
/tp note "switched to GraphQL — don't touch REST endpoints"
Stays under 2K tokens. Keeps last 5 sessions, older ones rotate out.
Smart Warnings
TokenPilot automatically detects and warns about token-wasting patterns:
- Rapid-fire prompts — 3+ short messages in a row triggers a "batch your questions" suggestion
- Session age — every 15 prompts, suggests
/compactor starting fresh - Peak hours — warns once per session during 5-11am PT weekdays (Anthropic burns limits faster during peak)
All warnings appear as [TokenPilot] messages and respect the on/off toggle.
File Structure
tokenpilot/
├── server.py # FastMCP server + CLI entry point
├── classifier.py # Task classifier (v2: negation, adjacency, debug)
├── config.py # Aggressiveness scale + adaptive thinking caps
├── db.py # SQLite persistence (WAL, indexed, serializable)
├── brain.py # Project Brain — auto-generated tpcontext.md
├── tool_registry.py # Tool cost estimates + cheaper alternatives
├── tracker.py # In-memory tracker (used by MCP server process)
├── requirements.txt
├── commands/
│ └── tp.md # /tp slash command (copy to ~/.claude/commands/)
├── hooks/
│ ├── session_start.sh # SessionStart — init + load brain
│ ├── classify.sh # UserPromptSubmit — classify + rapid-fire + peak hours + session age
│ ├── check_read.sh # PreToolUse (Read) — dedup + tool routing
│ └── post_tool.sh # PostToolUse — real token tracking
└── templates/
└── claudeignore-default
How Token Savings Stack
| Layer | What | Savings |
|---|---|---|
| TokenPilot classifier | Right effort level per task | Thinking token reduction |
| TokenPilot file dedup | Skip redundant file reads | ~2K tokens per blocked read |
| TokenPilot tool routing | Suggest cheaper tool alternatives | 60-90% per substitution |
| TokenPilot PostToolUse | Track actual token costs (visibility) | Measurement enables optimization |
| TokenPilot smart warnings | Batch prompts, session age, peak hours | Prevents context blowup |
| TokenPilot Project Brain | Resume sessions without re-explaining context | 3-5 messages saved per session start |
| RTK | Compress shell output | 60-90% on Bash results |
| MCP Compressor | Compress MCP tool schemas | 70-97% per wrapped server |
| .claudeignore | Exclude build artifacts from search | 30-40% on exploration |
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.