Ollama MCP Proxy

Ollama MCP Proxy

A comprehensive MCP proxy server that bridges MCP clients with Ollama local language models, providing advanced features like RAG integration, context management, caching, and production-ready security.

Category
Visit Server

README

Ollama MCP Proxy

Python 3.8+ License: MIT MCP Compatible

A comprehensive Model Context Protocol (MCP) proxy server that bridges MCP clients with Ollama's local language models, providing advanced features like RAG integration, context management, caching, and production-ready security.

๐ŸŒŸ Features

Core Functionality

  • MCP Protocol Implementation: Full server-side MCP support with tools, resources, and prompts
  • Ollama Integration: Seamless connection to local Ollama language models
  • Multiple Transport Methods: HTTP with Server-Sent Events (SSE) and WebSocket support
  • Advanced Context Management: Session-based isolation with conversation branching and merging

Advanced AI Capabilities

  • RAG Integration: Vector-based document retrieval with FAISS and sentence transformers
  • Knowledge Base Connectivity: Integration with external knowledge sources
  • Advanced Summarization: Context window management with intelligent summarization
  • Multi-Model Support: Dynamic model discovery and switching

Performance & Production Features

  • Intelligent Caching: Multi-tier caching with Redis and local fallback
  • Circuit Breaker Pattern: Fault tolerance with automatic recovery
  • Rate Limiting: Configurable request throttling and protection
  • Streaming Optimization: Efficient real-time response streaming

Security & Authentication

  • OAuth 2.0 Support: Comprehensive authentication and authorization
  • Role-Based Access Control (RBAC): Granular permission management
  • Data Encryption: At-rest encryption for sensitive conversation data
  • Security Headers: Production-ready security configuration

Developer Experience

  • Comprehensive Testing: Unit, integration, and load testing suites
  • Development Tools: Hot reload, profiling, and debugging support
  • Structured Logging: JSON-formatted logs with correlation IDs
  • Configuration Management: Environment-based configuration with validation

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   MCP Client    โ”‚    โ”‚   Ollama MCP Proxy   โ”‚    โ”‚  Ollama Server  โ”‚
โ”‚  (Claude, etc.) โ”‚โ—„โ”€โ”€โ–บโ”‚                      โ”‚โ—„โ”€โ”€โ–บโ”‚   (Local AI)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚  Configuration  โ”‚
                       โ”‚   & Storage     โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Components

  • OllamaMCPServer: Main MCP server implementation with tool and resource handlers
  • OllamaClient: Robust HTTP client with retry logic and circuit breaker
  • ContextManager: Sophisticated session management with branching and search
  • RAG Integration: Vector-based document retrieval and knowledge augmentation
  • Security Framework: Authentication, authorization, and data protection
  • Cache System: Multi-level caching with intelligent warming and invalidation

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Ollama installed and running locally
  • Redis (optional, for distributed caching)

Installation

  1. Clone the repository:
git clone https://github.com/ollama-mcp-proxy/ollama-mcp-proxy.git
cd ollama-mcp-proxy
  1. Create and activate virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Unix/macOS
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Start Ollama (if not already running):
ollama serve
  1. Run the MCP proxy:
python -m ollama_mcp_proxy.server --config config/development.json

Or using the CLI:

ollama-mcp-proxy --config config/development.json

๐Ÿ“ Configuration

The proxy uses JSON configuration files for different environments:

  • config/development.json - Development settings with debug mode
  • config/production.json - Production-ready configuration

Key Configuration Sections

{
  "ollama": {
    "host": "localhost",
    "port": 11434,
    "timeout": 30,
    "max_retries": 3
  },
  "mcp": {
    "port": 8000,
    "transport": "http",
    "auth_enabled": false
  },
  "cache": {
    "enabled": true,
    "type": "hybrid",
    "redis": {
      "enabled": true,
      "host": "localhost",
      "port": 6379
    }
  },
  "rag": {
    "enabled": false,
    "vector_store": "faiss",
    "embedding_model": "all-MiniLM-L6-v2"
  }
}

Environment Variables

  • OLLAMA_HOST - Ollama server host (default: localhost:11434)
  • MCP_PROXY_PORT - MCP proxy port (default: 8000)
  • OLLAMA_MCP_CONFIG - Path to configuration file

๐Ÿ”ง Claude Desktop Integration

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "ollama-proxy": {
      "command": "python",
      "args": ["-m", "ollama_mcp_proxy"],
      "env": {
        "OLLAMA_HOST": "localhost:11434",
        "MCP_PROXY_PORT": "8000"
      }
    }
  }
}

๐Ÿ› ๏ธ Available Tools

The proxy exposes several MCP tools:

Text Completion

{
  "name": "ollama_completion",
  "arguments": {
    "prompt": "Explain quantum computing",
    "model": "llama2",
    "temperature": 0.7,
    "max_tokens": 500
  }
}

Code Completion

{
  "name": "code_completion",
  "arguments": {
    "code": "def factorial(n):",
    "language": "python",
    "model": "codellama"
  }
}

Tool Chaining

{
  "name": "tool_chain",
  "arguments": {
    "tools": [
      {"tool": "research", "args": {"topic": "AI ethics"}},
      {"tool": "summarize", "args": {"input": "{{previous}}"}}
    ]
  }
}

๐Ÿ“š Resources

MCP resources provide access to:

  • Model Information: /models/{model_name} - Model capabilities and metadata
  • System Status: /system/status - Health and performance metrics
  • Configuration: /config/current - Current configuration settings
  • Session Info: /sessions/{session_id} - Session context and history

๐Ÿงช Development

Setup Development Environment

  1. Install development dependencies:
pip install -e ".[dev]"
  1. Set up pre-commit hooks:
pre-commit install
  1. Run in development mode:
python -m ollama_mcp_proxy.server --config config/development.json --debug

Code Quality Tools

  • Black: Code formatting
  • isort: Import sorting
  • flake8: Linting
  • mypy: Type checking
# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Run linting
flake8 src/ tests/

# Type checking
mypy src/

๐Ÿงช Testing

The project includes comprehensive testing with pytest:

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=ollama_mcp_proxy --cov-report=html

# Run specific test categories
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only
pytest -m load          # Load tests only

# Run specific test file
pytest tests/test_auth.py -v

Test Categories

  • Unit Tests: Individual component testing with mocked dependencies
  • Integration Tests: End-to-end testing with real Ollama integration
  • Load Tests: Performance and concurrency testing

Test Configuration

Tests use comprehensive fixtures defined in conftest.py:

  • Mock Ollama client with predictable responses
  • Sample test data and configurations
  • Error scenario simulation
  • Async testing support

๐Ÿ“Š Performance Features

Caching Strategy

  • Response Caching: Intelligent caching with TTL-based expiration
  • Model Output Caching: Ollama response caching for repeated queries
  • Cache Warming: Proactive cache population for popular models
  • Distributed Caching: Redis integration for multi-instance deployments

Memory Management

  • Context Window Sliding: Automatic context truncation for long conversations
  • Memory Pressure Handling: Automatic cleanup when memory limits are reached
  • Session Compression: Zlib compression for inactive sessions
  • Garbage Collection: Efficient cleanup of expired sessions

Circuit Breaker

  • Fault Tolerance: Automatic failure detection and recovery
  • Exponential Backoff: Intelligent retry strategies
  • Health Monitoring: Continuous health checking of dependencies

๐Ÿ”’ Security Features

Authentication & Authorization

  • API Key Authentication: Secure key-based access control
  • OAuth 2.0 Integration: Industry-standard authentication
  • Role-Based Access Control: Granular permission management
  • JWT Token Support: Stateless authentication with JSON Web Tokens

Data Protection

  • Encryption at Rest: AES encryption for stored conversation data
  • Request Sanitization: Input validation and sanitization
  • Audit Logging: Comprehensive security event logging
  • Security Headers: CORS, CSP, and other security headers

Rate Limiting

  • Per-User Limits: Individual user rate limiting
  • Global Limits: System-wide protection against abuse
  • Sliding Window: Advanced rate limiting algorithms

๐Ÿš€ Production Deployment

Docker Deployment

# Build Docker image
docker build -t ollama-mcp-proxy .

# Run with Docker Compose
docker-compose up -d

Environment Configuration

# Production environment variables
export OLLAMA_MCP_CONFIG=/app/config/production.json
export REDIS_URL=redis://localhost:6379
export LOG_LEVEL=INFO

Monitoring

  • Health Endpoints: /health and /metrics endpoints
  • Structured Logging: JSON logs with correlation IDs
  • Performance Metrics: Request/response time tracking
  • Error Rate Monitoring: Comprehensive error tracking

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the test suite
  5. Submit a pull request

Code Style

  • Follow PEP 8 style guidelines
  • Use type hints throughout
  • Write comprehensive docstrings
  • Maintain test coverage above 90%

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Ollama for providing the local language model API
  • Model Context Protocol for the protocol specification
  • Anthropic for MCP development and Claude integration
  • All contributors who help make this project better

๐Ÿ“ž Support


Built with โค๏ธ for the MCP community

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured