claude-ollama-mcp
Lets Claude query and manage a local Ollama server — list models, inspect them, run generate/chat completions, pull or delete models.
README
Claude Ollama
Lets Claude Desktop query and manage a local Ollama server. List installed models, inspect them, run one-shot generate/chat completions against any local model, or pull/delete models from the registry — all without opening a terminal.
Typical use: comparing Claude's answer to a local model on the same prompt, running cheap bulk completions against a quantized model, or checking custom training-checkpoint models you've imported into Ollama.
Requirements
- A running Ollama server (
ollama serveor the Ollama app). - Default endpoint is
http://localhost:11434. Override via theollama_urluser config in Claude Desktop's extension settings if you run Ollama on a different host or port. - No npm dependencies — pure Node over the HTTP API.
Install (Claude Desktop)
- Download the latest
Ollama.mcpbfrom the Releases page. - In Claude Desktop: Settings → Extensions → Extension Developer → Install Extension → pick the
.mcpb. - (Optional) In the extension's settings, set
Ollama server URLif you run Ollama on a non-default host/port. Leave blank forhttp://localhost:11434.
Tools
| Tool | Annotation | Purpose |
|---|---|---|
ollama_status |
read-only | Health check + server version |
list_models |
read-only | Local models with size, digest, family, parameter size, quantization |
list_running |
read-only | Models currently loaded in VRAM |
show_model |
read-only | Model details: modelfile, parameters, template, capabilities |
generate |
open-world | One-shot text completion (non-streaming) |
chat |
open-world | Chat completion with message history (non-streaming) |
pull_model |
open-world | Download a model from the registry |
delete_model |
destructive | Remove a locally-installed model |
Example prompts
"Which local models do I have installed, and which one is currently loaded in VRAM?"
"Run
forge:b6c1on this prompt: '<blah>'. Compare that output to your own answer.""Show me the modelfile for
forge:b7c1— I want to check the temperature setting.""Pull
llama3.1:70b." (expect a long wait for large models)"Delete the
forge:b5c3model — I don't need that checkpoint anymore."
Privacy policy
This extension runs entirely on your local machine and sends HTTP requests only to your Ollama server (default http://localhost:11434). No data leaves your machine unless you explicitly configure ollama_url to point at a remote Ollama instance, in which case the prompts and responses travel to that server.
The information visible to Claude includes:
- All prompts and chat messages you pass to
generateandchat(these go to the Ollama server, which may log them depending on its configuration). - Full text of completions returned by Ollama.
- Metadata for every installed model (names, digests, sizes, quantization, modelfile contents).
- Which models are currently loaded in VRAM and their size footprint.
If you have installed models containing proprietary fine-tunes or modelfiles with sensitive metadata, note that Claude will see that information when you call show_model or list_models.
delete_model is destructive and cannot be undone from this extension — the model must be re-pulled from the registry (or re-imported from source blobs) if deleted by mistake.
Troubleshooting
"cannot reach Ollama at http://localhost:11434 — is the server running?" — Start Ollama with ollama serve or launch the Ollama app. Verify with curl http://localhost:11434/ (should return "Ollama is running").
pull_model hangs for a long time — Ollama's pull API with stream: false blocks until the full download completes, which for multi-GB models can take many minutes. If you're pulling a huge model, run ollama pull <name> in a terminal instead — you'll see streaming progress there, and subsequent MCP calls will find the model already installed.
Custom/remote Ollama endpoint — Set ollama_url in the extension's settings (e.g. http://192.168.1.42:11434). Requires restart of the extension.
list_running shows a model after you stopped using it — Ollama keeps models hot in VRAM for a configurable TTL (default 5 minutes). The expires_at timestamp tells you when it'll unload. This is Ollama's behavior, not the extension's.
Development
Single ~400-line Node.js script, zero npm dependencies. Rebuild the .mcpb:
cd bundle-source
zip -j ../Ollama.mcpb manifest.json package.json server.js README.md LICENSE icon.png glama.json
License
MIT. See LICENSE.
Related
- claude-terminal-mcp — shell, filesystem, and background jobs.
- claude-rocm-mcp — AMD GPU monitoring; pairs well for checking whether Ollama's loaded model is saturating VRAM.
- claude-sessions-mcp — tmux session management for long-running jobs.
- claude-linux-mcp — X11 desktop control.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.