MCP Local LLM Server
Processes large files, error logs, and codebase searches locally via Ollama to reduce token consumption in Cursor by up to 80%.
README
MCP Local LLM Server - Context Compressor
A Model Context Protocol (MCP) server that uses Ollama as a context preprocessor to reduce token consumption in Cursor by up to 80%. This server processes large files, error logs, and codebase searches locally before sending optimized summaries to Cursor.
Features
- š¤ Ollama Integration: Uses Ollama for local LLM processing
- š Token Reduction: Reduces token usage by processing context locally
- š§ Multiple Tools: Context compression, code analysis, log processing, and semantic search
- š MCP Prompts: Dynamic instruction injection for Cursor
- š¦ MCP Resources: Read-only data resources (config, models, tools, prompts, stats)
- āļø Configurable: Customizable model, temperature, and token limits
- š Easy Setup: Simple installation and configuration
- š” MCP Compatible: Works with any MCP-compatible client
- š Privacy: Processes sensitive data locally, only sends summaries to Cursor
- ā Full MCP Support: Implements all core MCP capabilities (Tools, Prompts, Resources)
Prerequisites
- Node.js (version 18 or higher)
- Ollama installed and running
- Download from: https://ollama.ai/
- Install and start Ollama service
- Pull at least one model:
ollama pull llama3
Installation
- Clone or download this repository
- Install dependencies:
npm install
Configuration
The server can be configured using environment variables in the MCP client configuration file.
Provider Selection
Set LLM_PROVIDER to choose which LLM provider to use:
# Select provider: 'ollama', 'openai', 'anthropic', or 'gemini'
export LLM_PROVIDER=ollama
Provider-Specific Configuration
Ollama (default):
export LLM_PROVIDER=ollama
export OLLAMA_URL=http://localhost:11434
export MODEL_NAME=llama3
OpenAI:
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-your-api-key
export MODEL_NAME=gpt-3.5-turbo
Anthropic:
export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-your-api-key
export MODEL_NAME=claude-3-haiku-20240307
Gemini:
export LLM_PROVIDER=gemini
export GEMINI_API_KEY=your-api-key
export MODEL_NAME=gemini-1.5-flash
Common Settings
# Maximum tokens in response (default: 256)
export MAX_TOKENS=256
# Temperature for response generation (default: 0.7)
export TEMPERATURE=0.7
See MCP_CONFIGURATION.md for detailed configuration examples.
Usage
Starting the Server
# Start the MCP server
npm start
# Or for development with auto-restart
npm run dev
Available Tools
The server provides multiple tools for context compression and code analysis:
1. analyze_huge_file
Analyzes large files locally and returns a structured summary with architecture, global variables, entry points, and main logic. Reduces token usage by processing files locally before sending to Cursor.
Parameters:
path(required): Path to the file to analyze
Example:
{
"name": "analyze_huge_file",
"arguments": {
"path": "/path/to/large-file.js"
}
}
Returns: JSON with architecture, global_variables, entry_points, main_logic, and original_size
2. digest_error_logs
Processes error logs locally to identify patterns, remove repetitive timestamps, and group similar errors. Returns a structured summary with probable cause and statistics.
Parameters:
log_file_path(optional): Path to the log fileterminal_output(optional): Direct terminal output content
Example:
{
"name": "digest_error_logs",
"arguments": {
"log_file_path": "/path/to/error.log"
}
}
Returns: JSON with probable_cause, occurrences, period, error_types, and recommendation
3. codebase_discovery
Performs semantic search in the codebase to find files and specific lines where related logic is implemented. Uses local processing to reduce token usage.
Parameters:
query(required): Semantic query about the code (e.g., "where is payment processed?")root_path(optional): Root directory path to search in (default: current directory)
Example:
{
"name": "codebase_discovery",
"arguments": {
"query": "where is payment processed?",
"root_path": "/path/to/project"
}
}
Returns: JSON with files (array of file references with line numbers), total_occurrences, and summary
4. ask_llm
Ask a question to the AI model running via Ollama and get a response.
Parameters:
question(required): The question or prompt to send to the AI model
Example:
{
"name": "ask_llm",
"arguments": {
"question": "What is the capital of France?"
}
}
5. check_llm_status
Check if Ollama is running and accessible.
Example:
{
"name": "check_llm_status",
"arguments": {}
}
6. think_through
Adds an extra thinking layer by analyzing tasks, considering multiple approaches, and providing structured reasoning before execution.
Parameters:
task(required): The task, question, or problem to think throughcontext(optional): Additional context about the situationfocus_areas(optional): Specific areas to focus on (e.g., ["security", "performance"])output_format(optional): Format of output - "plan", "analysis", "considerations", or "structured" (default)
Example:
{
"name": "think_through",
"arguments": {
"task": "Refactor authentication to use JWT",
"context": "Current: session-based, Node.js/Express",
"focus_areas": ["security", "maintainability"],
"output_format": "structured"
}
}
Available Prompts
The server provides MCP prompts that inject dynamic instructions into Cursor:
mcp_tool_usage_rules: Mandatory rules for using MCP tools instead of direct actionstoken_economy_guidelines: Guidelines for maximizing token savingsthinking_layer_instructions: Instructions for using the thinking layercontext_compression_rules: Rules for using context compression toolschat_end_summary_rule: Automatically stores chat summaries using memory_store tool (can be disabled viaDISABLE_CHAT_SUMMARY_RULE)
Note: The chat_end_summary_rule prompt is automatically available to all projects using this MCP server. To disable it, set the environment variable DISABLE_CHAT_SUMMARY_RULE=true in your MCP configuration.
Available Resources
The server exposes read-only resources via MCP:
mcp://local-llm/config: Current server configurationmcp://local-llm/models: List of available Ollama modelsmcp://local-llm/tools: List of all available toolsmcp://local-llm/prompts: List of all available promptsmcp://local-llm/usage_stats: Usage statistics and token savings info
Example:
{
"method": "resources/read",
"params": {
"uri": "mcp://local-llm/config"
}
}
MCP Client Integration
To use this server with an MCP client (like Cursor), add it to your client configuration.
Basic Configuration (Ollama)
{
"mcpServers": {
"local-llm": {
"command": "node",
"args": ["path/to/your/mcp-local-llm/src/index.js"],
"env": {
"LLM_PROVIDER": "ollama",
"OLLAMA_URL": "http://localhost:11434",
"MODEL_NAME": "llama3"
}
}
}
}
Using Different Providers
The server supports multiple LLM providers. Set LLM_PROVIDER to switch:
OpenAI:
{
"env": {
"LLM_PROVIDER": "openai",
"OPENAI_API_KEY": "sk-your-key",
"MODEL_NAME": "gpt-3.5-turbo"
}
}
Anthropic:
{
"env": {
"LLM_PROVIDER": "anthropic",
"ANTHROPIC_API_KEY": "sk-ant-your-key",
"MODEL_NAME": "claude-3-haiku-20240307"
}
}
Gemini:
{
"env": {
"LLM_PROVIDER": "gemini",
"GEMINI_API_KEY": "your-key",
"MODEL_NAME": "gemini-1.5-flash"
}
}
See MCP_CONFIGURATION.md for complete configuration guide.
Troubleshooting
Common Issues
-
Connection Refused Error
- Make sure Ollama is running:
ollama serveor check if the service is running - Verify Ollama is accessible at http://localhost:11434
- Check if Ollama is installed:
ollama --version
- Make sure Ollama is running:
-
No Models Available
- Pull a model:
ollama pull llama3 - Check available models:
ollama list - Recommended models:
llama3,deepseek-coder,codellama,mistral
- Pull a model:
-
Timeout Errors
- Large files may take time to process (max 15 seconds)
- Consider using smaller models for faster responses
- Check Ollama resource allocation
-
Tool Errors
- Tools return generic error messages (identity hiding)
- Check server logs for detailed error information
- Verify file paths are correct and accessible
Testing the Server
You can test the server manually by sending MCP requests:
# Test checking Ollama status
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "check_llm_status", "arguments": {}}}' | node src/index.js
# Test asking a question
echo '{"jsonrpc": "2.0", "id": 2, "method": "tools/call", "params": {"name": "ask_llm", "arguments": {"question": "Hello, how are you?"}}}' | node src/index.js
# Test analyzing a file
echo '{"jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {"name": "analyze_huge_file", "arguments": {"path": "src/index.js"}}}' | node src/index.js
Development
Project Structure
mcp-local-llm/
āāā src/
ā āāā index.js # Main MCP server implementation
ā āāā tools/ # Tool implementations
ā āāā AnalyzeHugeFileTool.js
ā āāā DigestErrorLogsTool.js
ā āāā CodebaseDiscoveryTool.js
ā āāā ... (other tools)
āāā package.json # Dependencies and scripts
āāā README.md # This file
Adding New Tools
To add new tools:
- Create a new tool class extending
BaseToolinsrc/tools/ - Implement
getToolDefinition()andhandle()methods - Add the tool to
src/tools/index.jsexports andALL_TOOLSarray - The tool will be automatically registered with the MCP server
License
MIT License - feel free to use and modify as needed.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.