Ollama MCP Proxy
A comprehensive MCP proxy server that bridges MCP clients with Ollama local language models, providing advanced features like RAG integration, context management, caching, and production-ready security.
README
Ollama MCP Proxy
A comprehensive Model Context Protocol (MCP) proxy server that bridges MCP clients with Ollama's local language models, providing advanced features like RAG integration, context management, caching, and production-ready security.
๐ Features
Core Functionality
- MCP Protocol Implementation: Full server-side MCP support with tools, resources, and prompts
- Ollama Integration: Seamless connection to local Ollama language models
- Multiple Transport Methods: HTTP with Server-Sent Events (SSE) and WebSocket support
- Advanced Context Management: Session-based isolation with conversation branching and merging
Advanced AI Capabilities
- RAG Integration: Vector-based document retrieval with FAISS and sentence transformers
- Knowledge Base Connectivity: Integration with external knowledge sources
- Advanced Summarization: Context window management with intelligent summarization
- Multi-Model Support: Dynamic model discovery and switching
Performance & Production Features
- Intelligent Caching: Multi-tier caching with Redis and local fallback
- Circuit Breaker Pattern: Fault tolerance with automatic recovery
- Rate Limiting: Configurable request throttling and protection
- Streaming Optimization: Efficient real-time response streaming
Security & Authentication
- OAuth 2.0 Support: Comprehensive authentication and authorization
- Role-Based Access Control (RBAC): Granular permission management
- Data Encryption: At-rest encryption for sensitive conversation data
- Security Headers: Production-ready security configuration
Developer Experience
- Comprehensive Testing: Unit, integration, and load testing suites
- Development Tools: Hot reload, profiling, and debugging support
- Structured Logging: JSON-formatted logs with correlation IDs
- Configuration Management: Environment-based configuration with validation
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ MCP Client โ โ Ollama MCP Proxy โ โ Ollama Server โ
โ (Claude, etc.) โโโโโบโ โโโโโบโ (Local AI) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Configuration โ
โ & Storage โ
โโโโโโโโโโโโโโโโโโโ
Key Components
- OllamaMCPServer: Main MCP server implementation with tool and resource handlers
- OllamaClient: Robust HTTP client with retry logic and circuit breaker
- ContextManager: Sophisticated session management with branching and search
- RAG Integration: Vector-based document retrieval and knowledge augmentation
- Security Framework: Authentication, authorization, and data protection
- Cache System: Multi-level caching with intelligent warming and invalidation
๐ Quick Start
Prerequisites
- Python 3.8 or higher
- Ollama installed and running locally
- Redis (optional, for distributed caching)
Installation
- Clone the repository:
git clone https://github.com/ollama-mcp-proxy/ollama-mcp-proxy.git
cd ollama-mcp-proxy
- Create and activate virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Unix/macOS
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Start Ollama (if not already running):
ollama serve
- Run the MCP proxy:
python -m ollama_mcp_proxy.server --config config/development.json
Or using the CLI:
ollama-mcp-proxy --config config/development.json
๐ Configuration
The proxy uses JSON configuration files for different environments:
config/development.json- Development settings with debug modeconfig/production.json- Production-ready configuration
Key Configuration Sections
{
"ollama": {
"host": "localhost",
"port": 11434,
"timeout": 30,
"max_retries": 3
},
"mcp": {
"port": 8000,
"transport": "http",
"auth_enabled": false
},
"cache": {
"enabled": true,
"type": "hybrid",
"redis": {
"enabled": true,
"host": "localhost",
"port": 6379
}
},
"rag": {
"enabled": false,
"vector_store": "faiss",
"embedding_model": "all-MiniLM-L6-v2"
}
}
Environment Variables
OLLAMA_HOST- Ollama server host (default: localhost:11434)MCP_PROXY_PORT- MCP proxy port (default: 8000)OLLAMA_MCP_CONFIG- Path to configuration file
๐ง Claude Desktop Integration
Add to your Claude Desktop MCP configuration:
{
"mcpServers": {
"ollama-proxy": {
"command": "python",
"args": ["-m", "ollama_mcp_proxy"],
"env": {
"OLLAMA_HOST": "localhost:11434",
"MCP_PROXY_PORT": "8000"
}
}
}
}
๐ ๏ธ Available Tools
The proxy exposes several MCP tools:
Text Completion
{
"name": "ollama_completion",
"arguments": {
"prompt": "Explain quantum computing",
"model": "llama2",
"temperature": 0.7,
"max_tokens": 500
}
}
Code Completion
{
"name": "code_completion",
"arguments": {
"code": "def factorial(n):",
"language": "python",
"model": "codellama"
}
}
Tool Chaining
{
"name": "tool_chain",
"arguments": {
"tools": [
{"tool": "research", "args": {"topic": "AI ethics"}},
{"tool": "summarize", "args": {"input": "{{previous}}"}}
]
}
}
๐ Resources
MCP resources provide access to:
- Model Information:
/models/{model_name}- Model capabilities and metadata - System Status:
/system/status- Health and performance metrics - Configuration:
/config/current- Current configuration settings - Session Info:
/sessions/{session_id}- Session context and history
๐งช Development
Setup Development Environment
- Install development dependencies:
pip install -e ".[dev]"
- Set up pre-commit hooks:
pre-commit install
- Run in development mode:
python -m ollama_mcp_proxy.server --config config/development.json --debug
Code Quality Tools
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Run linting
flake8 src/ tests/
# Type checking
mypy src/
๐งช Testing
The project includes comprehensive testing with pytest:
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=ollama_mcp_proxy --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m load # Load tests only
# Run specific test file
pytest tests/test_auth.py -v
Test Categories
- Unit Tests: Individual component testing with mocked dependencies
- Integration Tests: End-to-end testing with real Ollama integration
- Load Tests: Performance and concurrency testing
Test Configuration
Tests use comprehensive fixtures defined in conftest.py:
- Mock Ollama client with predictable responses
- Sample test data and configurations
- Error scenario simulation
- Async testing support
๐ Performance Features
Caching Strategy
- Response Caching: Intelligent caching with TTL-based expiration
- Model Output Caching: Ollama response caching for repeated queries
- Cache Warming: Proactive cache population for popular models
- Distributed Caching: Redis integration for multi-instance deployments
Memory Management
- Context Window Sliding: Automatic context truncation for long conversations
- Memory Pressure Handling: Automatic cleanup when memory limits are reached
- Session Compression: Zlib compression for inactive sessions
- Garbage Collection: Efficient cleanup of expired sessions
Circuit Breaker
- Fault Tolerance: Automatic failure detection and recovery
- Exponential Backoff: Intelligent retry strategies
- Health Monitoring: Continuous health checking of dependencies
๐ Security Features
Authentication & Authorization
- API Key Authentication: Secure key-based access control
- OAuth 2.0 Integration: Industry-standard authentication
- Role-Based Access Control: Granular permission management
- JWT Token Support: Stateless authentication with JSON Web Tokens
Data Protection
- Encryption at Rest: AES encryption for stored conversation data
- Request Sanitization: Input validation and sanitization
- Audit Logging: Comprehensive security event logging
- Security Headers: CORS, CSP, and other security headers
Rate Limiting
- Per-User Limits: Individual user rate limiting
- Global Limits: System-wide protection against abuse
- Sliding Window: Advanced rate limiting algorithms
๐ Production Deployment
Docker Deployment
# Build Docker image
docker build -t ollama-mcp-proxy .
# Run with Docker Compose
docker-compose up -d
Environment Configuration
# Production environment variables
export OLLAMA_MCP_CONFIG=/app/config/production.json
export REDIS_URL=redis://localhost:6379
export LOG_LEVEL=INFO
Monitoring
- Health Endpoints:
/healthand/metricsendpoints - Structured Logging: JSON logs with correlation IDs
- Performance Metrics: Request/response time tracking
- Error Rate Monitoring: Comprehensive error tracking
๐ค Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Workflow
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the test suite
- Submit a pull request
Code Style
- Follow PEP 8 style guidelines
- Use type hints throughout
- Write comprehensive docstrings
- Maintain test coverage above 90%
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Ollama for providing the local language model API
- Model Context Protocol for the protocol specification
- Anthropic for MCP development and Claude integration
- All contributors who help make this project better
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Read the Docs
Built with โค๏ธ for the MCP community
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.