distill-mcp-v2
distill-mcp-v2 is a high-performance, network-dependency-free Python FastMCP server designed to aggressively optimize Large Language Model (LLM) context windows. It provides specialized tools for compressing and analyzing massive AI-agent payloads without losing critical semantic information.
README
<div align="center"> <h1>Distill v2</h1> <p><strong>Model Context Protocol (MCP) Server for Massive Context Engineering & Compression</strong></p>
β‘ Overview
distill-mcp-v2 is a high-performance, network-dependency-free Python FastMCP server designed to aggressively optimize Large Language Model (LLM) context windows. It provides specialized tools for compressing and analyzing massive AIβagent payloads without losing critical semantic information.
By filtering noise, stabilizing cache prefixes, and running multi-model token cost estimations locally, Distill v2 dramatically reduces API costs and preserves LLM reasoning abilities when dealing with heavy payloads like infinite logs, monolithic API schemas, and multi-agent war room transcripts.
π Performance & Token Compression
Our rigorous, independent stress-testing benchmarks (audited via pytest and chaos blueprints) prove that distill-mcp-v2 achieves up to 99.7% token compression while retaining 100% of the crucial context.
| Scenario | Payload Profile | Raw Tokens | Distilled Tokens | Savings % |
|---|---|---|---|---|
| Trace Avalanche | Heavy Java Stacktraces | 150,027 | 546 | 99.6% |
| Schema Monolith | Massive Microservice JSON | 56,588 | 1,355 | 97.6% |
| Incident War Room | Multi-Agent Chat Logs | 117,952 | 371 | 99.7% |
(Tested against Claude-3-Opus budgets. Scenario 1 reduced costs from $2.25/call to $0.008/call.)
Read the full Benchmark & Execution Report for deeper insights.
π Features & Toolset
Distill v2 exposes 8 precise tools to agents via the Model Context Protocol:
distill_jsonβ Compresses raw JSON payloads, retaining anomalies, exceptions, and errors.distill_logsβ Compresses raw.logfiles, preserving head/tail contexts and stack traces.distill_schemaβ Compacts massive MCP tool catalogs and JSON schemas to structural parameters.distill_responseβ Progressively prunes, minifies, and truncates outputs to fit strict token budgets.distill_conversationβ Extracts goals, decisions, blockers, and actions from multi-agent transcripts without leaking raw ISO timestamps.stabilize_for_cacheβ Maps chaotic raw identifiers (UUIDs, hex IDs) to sequential placeholders to stabilize LLM prompt caching.analyze_tokensβ Accurately estimates token counts usingtiktoken(cl100k_base).compareβ Computes detailed diffs and cost-savings analyses between raw and distilled payloads.
π¦ Quick Start
Installation
Distill v2 requires Python 3.10+. We recommend using uv or pip in an isolated virtual environment.
# Clone the repository
git clone https://github.com/yatinkoul/distill.git
cd distill
# Create a virtual environment and activate it
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
Running the Server
Run the FastMCP server, which natively binds to a stateless HTTP endpoint.
# Run the FastMCP server (default host: 0.0.0.0, port: 8000)
distill --host 0.0.0.0 --port 8000
# Or using uvicorn directly:
.venv/bin/uvicorn src.main:app --host 0.0.0.0 --port 8000
The server exposes a JSONβRPC endpoint at http://localhost:8000/mcp.
π§ͺ Stress Testing & Development
The repository ships with an exhaustive stress-test runner that dynamically allocates ports, executes deterministic payload scenarios, and validates the integrity of the distilled outputs.
To run the complete test suite (118/118 passing):
# Install development dependencies
pip install -e ".[dev]"
# Run the test suite
pytest
To manually execute the stress-testing blueprints and regenerate the benchmark report:
python stress_tests_blueprints/run_blueprints.py
π Documentation
- Project Goals & Strategy β High-level strategy and milestones.
- Benchmark Report β Exhaustive stress-test results.
- Testing Infrastructure β Notes on the testing architecture and CI/CD readiness.
π€ Contributing
Contributions, issues, and feature requests are welcome!
Please read our CONTRIBUTING.md (coming soon) for guidelines on how to propose improvements. Ensure all tests and linting (ruff) pass before submitting pull requests.
π License
This project is licensed under the MIT License β see the LICENSE file for details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.