MCP Ollama Server
Provides direct access to Ollama models for AI inference, including text generation, chat, model management, and embeddings.
README
MCP Ollama Server
A Model Context Protocol (MCP) server that provides direct access to Ollama models for AI inference.
Features
- 🚀 Direct Model Access: Generate responses and chat with any Ollama model
- 💬 Chat Support: Maintain conversation context with chat endpoints
- 📋 Model Management: List, pull, delete, and get info about models
- 🔢 Embeddings: Generate text embeddings for semantic search
- 🔧 Full Control: Configure temperature, max tokens, and system prompts
- ✅ Status Checking: Automatic Ollama availability detection
Installation
-
Prerequisites:
- Ollama installed and running
- Node.js 18+ installed
-
Install the MCP server:
cd /Users/bard/Code/mcp-ollama npm install npm run build -
Add to Claude Desktop config: Edit
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "ollama": { "command": "node", "args": ["/Users/bard/Code/mcp-ollama/dist/index.js"], "env": { "OLLAMA_BASE_URL": "http://localhost:11434" } } } } -
Restart Claude Desktop
Usage
Generate Text
// Simple generation
ollama_generate({
prompt: "What is the meaning of life?"
})
// With system prompt and parameters
ollama_generate({
model: "llama3.2",
prompt: "Write a haiku about coding",
system: "You are a creative poet",
temperature: 0.9,
max_tokens: 100
})
Chat Conversations
// Multi-turn conversation
ollama_chat({
model: "llama3.2",
messages: [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "What is Python?" },
{ role: "assistant", content: "Python is a high-level programming language..." },
{ role: "user", content: "What makes it good for beginners?" }
]
})
Model Management
// List available models
ollama_list()
// Pull a new model
ollama_pull({ model: "mistral" })
// Get model information
ollama_info({ model: "llama3.2" })
// Delete a model
ollama_delete({ model: "old-model" })
Generate Embeddings
// Generate embeddings for semantic search
ollama_embeddings({
model: "nomic-embed-text",
prompt: "The quick brown fox jumps over the lazy dog"
})
Available Models
Popular models you can use:
llama3.2- Fast, efficient general-purpose modeldeepseek-r1- Advanced reasoning modelmistral- Efficient 7B parameter modelgemma:2b- Google's small efficient modelphi3:mini- Microsoft's compact modelnomic-embed-text- For generating embeddings
Pull any model with:
ollama pull <model-name>
Configuration
Environment Variables
OLLAMA_BASE_URL: Ollama API endpoint (default:http://localhost:11434)
Tool Parameters
ollama_generate
model: Model to use (default: "llama3.2")prompt: Input prompt (required)system: System prompt (optional)temperature: Sampling temperature 0-1 (default: 0.7)max_tokens: Maximum tokens to generate (default: 2048)stream: Stream responses (default: false)
ollama_chat
model: Model to use (default: "llama3.2")messages: Array of chat messages (required)temperature: Sampling temperature 0-1 (default: 0.7)max_tokens: Maximum tokens to generate (default: 2048)
Troubleshooting
Ollama not running
If you see "❌ Ollama is not running", start Ollama:
ollama serve
No models available
Pull a model first:
ollama pull llama3.2
Different Ollama port
If Ollama runs on a different port, update the config:
{
"env": {
"OLLAMA_BASE_URL": "http://localhost:YOUR_PORT"
}
}
Differences from ELVIS
This MCP server provides direct, synchronous access to Ollama models, unlike ELVIS which uses a delegation/queue pattern. Benefits:
- Immediate responses: No waiting for task completion
- Simpler API: Direct function calls instead of task management
- Native chat support: Built-in conversation handling
- Model management: Pull, delete, and inspect models
- Embeddings support: Generate embeddings for RAG applications
Development
Running in development:
npm run dev
Building:
npm run build
Testing:
# Test generate
curl -X POST http://localhost:11434/api/generate \
-d '{"model": "llama3.2", "prompt": "Hello"}'
# Test chat
curl -X POST http://localhost:11434/api/chat \
-d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.