Context Optimizer MCP
An MCP server that uses Redis and in-memory caching to optimize and extend context windows for large chat histories
degenhero
README
Context Optimizer MCP
An MCP (Model Context Protocol) server that uses Redis and in-memory caching to optimize and extend context windows for large chat histories.
Features
- Dual-Layer Caching: Combines fast in-memory LRU cache with persistent Redis storage
- Smart Context Management: Automatically summarizes older messages to maintain context within token limits
- Rate Limiting: Redis-based rate limiting with burst protection
- API Compatibility: Drop-in replacement for Anthropic API with enhanced context handling
- Metrics Collection: Built-in performance monitoring and logging
How It Works
This MCP server acts as a middleware between your application and LLM providers (currently supporting Anthropic's Claude models). It intelligently manages conversation context through these strategies:
-
Context Window Optimization: When conversations approach the model's token limit, older messages are automatically summarized while preserving key information.
-
Efficient Caching:
- In-memory LRU cache for frequently accessed conversation summaries
- Redis for persistent, distributed storage of conversation history and summaries
-
Transparent Processing: The server handles all context management automatically while maintaining compatibility with the standard API.
Getting Started
Prerequisites
- Node.js 18+
- Redis server (local or remote)
- Anthropic API key
Installation Options
1. Using MCP client
The easiest way to install and run this server is using the MCP client:
# Install via npx
npx mcp install degenhero/context-optimizer-mcp
# Or using uvx
uvx mcp install degenhero/context-optimizer-mcp
Make sure to set your Anthropic API key when prompted during installation.
2. Manual Installation
# Clone the repository
git clone https://github.com/degenhero/context-optimizer-mcp.git
cd context-optimizer-mcp
# Install dependencies
npm install
# Set up environment variables
cp .env.example .env
# Edit .env with your configuration
# Start the server
npm start
3. Using Docker
# Clone the repository
git clone https://github.com/degenhero/context-optimizer-mcp.git
cd context-optimizer-mcp
# Build and start with Docker Compose
docker-compose up -d
This will start both the MCP server and a Redis instance.
Configuration
Configure the server by editing the .env
file:
# Server configuration
PORT=3000
# Anthropic API key
ANTHROPIC_API_KEY=your_anthropic_api_key
# Redis configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
# Caching settings
IN_MEMORY_CACHE_MAX_SIZE=1000
REDIS_CACHE_TTL=86400 # 24 hours in seconds
# Model settings
DEFAULT_MODEL=claude-3-opus-20240229
DEFAULT_MAX_TOKENS=4096
API Usage
The server exposes a compatible API endpoint that works like the standard Claude API with additional context optimization features:
// Example client usage
const response = await fetch('http://localhost:3000/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-3-opus-20240229',
messages: [
{ role: 'user', content: 'Hello!' },
{ role: 'assistant', content: 'How can I help you today?' },
{ role: 'user', content: 'Tell me about context management.' }
],
max_tokens: 1000,
// Optional MCP-specific parameters:
conversation_id: 'unique-conversation-id', // For context tracking
context_optimization: true, // Enable/disable optimization
}),
});
const result = await response.json();
Additional Endpoints
GET /v1/token-count?text=your_text&model=model_name
: Count tokens in a text stringGET /health
: Server health checkGET /metrics
: View server performance metrics
Testing
A test script is included to demonstrate how context optimization works:
# Run the test script
npm run test:context
This will start an interactive session where you can have a conversation and see how the context gets optimized as it grows.
Advanced Features
Context Summarization
When a conversation exceeds 80% of the model's token limit, the server automatically summarizes older messages. This summarization is cached for future use.
Conversation Continuity
By providing a consistent conversation_id
in requests, the server can maintain context across multiple API calls, even if individual requests would exceed token limits.
Performance Considerations
- In-memory cache provides fastest access for active conversations
- Redis enables persistence and sharing across server instances
- Summarization operations add some latency to requests that exceed token thresholds
Documentation
Additional documentation can be found in the docs/
directory:
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
MCP Package Docs Server
Facilitates LLMs to efficiently access and fetch structured documentation for packages in Go, Python, and NPM, enhancing software development with multi-language support and performance optimization.
Claude Code MCP
An implementation of Claude Code as a Model Context Protocol server that enables using Claude's software engineering capabilities (code generation, editing, reviewing, and file operations) through the standardized MCP interface.
@kazuph/mcp-taskmanager
Model Context Protocol server for Task Management. This allows Claude Desktop (or any MCP client) to manage and execute tasks in a queue-based system.
Linear MCP Server
Enables interaction with Linear's API for managing issues, teams, and projects programmatically through the Model Context Protocol.
mermaid-mcp-server
A Model Context Protocol (MCP) server that converts Mermaid diagrams to PNG images.
Jira-Context-MCP
MCP server to provide Jira Tickets information to AI coding agents like Cursor

Linear MCP Server
A Model Context Protocol server that integrates with Linear's issue tracking system, allowing LLMs to create, update, search, and comment on Linear issues through natural language interactions.

Sequential Thinking MCP Server
This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.