Context-Optimizer-MCP
An MCP server suite that optimizes prompt context by reducing tokens up to 98.8%, acting as persistent long-term memory and codebase scanner to save API costs.
README
Context-Optimizer-MCP
Cuts AI prompt context by 90–99% and reduces API costs by $70–$140 per 1,000 queries.
A local-first Model Context Protocol (MCP) server suite that gives AI coding assistants persistent long-term memory and high-speed codebase discovery — eliminating context amnesia and token waste across every session.
The Problem
Modern AI coding assistants (Claude, Cursor, Copilot) suffer from two compounding problems:
- Context amnesia — Every new session starts from zero. Architectural decisions, past mistakes, and established patterns must be re-explained each time.
- Token waste — To answer a simple question, the AI blindly reads thousands of lines of source code, burning tokens on irrelevant logic before finding anything useful.
Context-Optimizer-MCP solves both.
Benchmark Results
Tested against this codebase (21 source files, 58,808 raw tokens) across 15 diverse query types — from narrow configuration lookups to broad architectural questions.
| Metric | Value |
|---|---|
| Minimum context reduction | 97.4% |
| Median context reduction | 99.81% |
| Maximum context reduction | 99.9% |
Run it yourself:
python benchmark.py
Architecture
┌─────────────────────────────────────────────┐
│ AI Agent (Claude / Cursor) │
└───────────────┬─────────────────────────────┘
│ MCP Protocol
┌───────────┴────────────┐
│ │
┌───▼──────────┐ ┌────────▼────────┐
│ Memory │ │ Discovery │
│ MCP Server │ │ MCP Server │
│ │ │ │
│ Stores and │ │ AST + Regex │
│ retrieves │ │ codebase scan │
│ decisions, │ │ → endpoints, │
│ mistakes, │ │ queries, │
│ observations │ │ tech debt │
└──────┬───────┘ └────────┬────────┘
│ │
└──────────┬──────────┘
│
┌───────▼────────┐ ┌──────────────────┐
│ SQLite DB │◄────────►│ ai-memory.yaml │
│ mcp_memory.db │ cli.py │ (Git-tracked) │
└───────┬────────┘ └──────────────────┘
│
┌───────▼────────┐
│ FastAPI + │
│ React Dashboard │
└────────────────┘
Core Components
Memory MCP Server
Persistent SQLite-backed memory for AI agents. Stores decisions, mistakes, and observations with full lifecycle management.
- Semantic deduplication — Embeds incoming memories (Gemini
text-embedding-004/ OpenAItext-embedding-3-small) and runs cosine similarity in RAM. Similarity ≥ 0.85 triggers a merge instead of a new insert, incrementing the existing memory's confidence score. Falls back to exact-string matching when no API key is present. - Staleness tracking — Classifies memories as
fresh(<30 days),warming(30–90 days), orstale(>90 days) based on last validation timestamp. Auto-migrates older databases on startup. - Memory pruning —
mem_prunedeletes unreinforced one-off entries (confidence == 1.0) older than N days, with dry-run mode on by default.
Discovery MCP Server
Scans codebases structurally using AST parsing and regex — extracts API endpoints, database queries, class/function maps, and # TODO debt markers without reading implementation logic line-by-line.
Produces a lightweight "blueprint" of the project that the AI can query in ~150 tokens instead of reading the full source.
Context Engine (FastAPI)
The search backend bridging agents and storage.
- Dual-mode semantic search — Uses vector embeddings when API keys are present; falls back to TF-IDF + cosine similarity for fully offline, zero-setup retrieval.
- Context compression — Retrieves, ranks, and compresses relevant memories and code structures before passing them to the LLM.
React Dashboard
Local UI for auditing the AI's memory state.
- Trigger codebase scans manually
- Search memories with Google-style queries
- Verify/refresh
warmingandstalememory cards with a one-click checkmark (✓) - View AST blueprints of the current project structure
Key Design Decisions
Cross-agent portability. Memory is stored in open SQLite — no vendor lock-in. Switch from Claude to Gemini tomorrow; the new agent inherits the full project history instantly.
Git-friendly memory sync. The binary .db file is not committed directly. cli.py export converts it to a human-readable ai-memory.yaml. Teams commit the YAML, and cli.py import --merge rebuilds the database on each machine using the semantic dedup engine to resolve conflicts rather than overwriting.
Zero mandatory dependencies. No API key required to run. Semantic search degrades gracefully to TF-IDF offline mode. The whole system works on an air-gapped machine.
Installation
Requirements: Python 3.10+, Node.js (only needed if rebuilding the dashboard; precompiled build included)
# 1. Clone and install
git clone https://github.com/your-username/Context-Optimizer-MCP.git
cd Context-Optimizer-MCP
pip install -r requirements.txt
# 2. Configure environment (API keys optional)
cp .env.template .env
# Add GEMINI_API_KEY or OPENAI_API_KEY to enable semantic search
# Leave blank for offline TF-IDF mode
# 3. Import memory from Git history
python cli.py import
# 4. Start the dashboard
python context_engine/server.py
# Open http://127.0.0.1:8000
Connecting to AI Clients
Cursor
Settings → Cursor Settings → Features → MCP → + Add New MCP Server
| Field | Value |
|---|---|
| Name | memory-server |
| Type | command |
| Command | python -u "C:/path/to/Context-Optimizer-MCP/mcp_servers/memory_server.py" |
Repeat for discovery-server using discovery_server.py.
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"codebase-memory": {
"command": "python",
"args": ["C:/path/to/Context-Optimizer-MCP/mcp_servers/memory_server.py"]
},
"codebase-discovery": {
"command": "python",
"args": ["C:/path/to/Context-Optimizer-MCP/mcp_servers/discovery_server.py"]
}
}
}
CLI Reference
# Export SQLite → YAML (for Git)
python cli.py export
# Rebuild SQLite from YAML
python cli.py import
# Merge YAML into existing DB (semantic dedup on conflicts)
python cli.py import --merge
# Preview stale memories eligible for pruning (dry run)
python cli.py prune --days 90 --confidence 1.0
# Execute pruning
python cli.py prune --days 90 --confidence 1.0 --execute
Tests
python -m unittest tests/test_memory_discovery.py
10 integration tests covering deduplication logic, YAML sync, staleness scoring, pruning, and benchmark validation. All passing.
Tech Stack
Python · FastAPI · SQLite · React · Vite · Model Context Protocol (MCP) · Google Gemini Embeddings · OpenAI Embeddings · TF-IDF / Cosine Similarity
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.