claude-ollama-mcp

claude-ollama-mcp

Lets Claude query and manage a local Ollama server — list models, inspect them, run generate/chat completions, pull or delete models.

Category
Visit Server

README

Claude Ollama

Lets Claude Desktop query and manage a local Ollama server. List installed models, inspect them, run one-shot generate/chat completions against any local model, or pull/delete models from the registry — all without opening a terminal.

Typical use: comparing Claude's answer to a local model on the same prompt, running cheap bulk completions against a quantized model, or checking custom training-checkpoint models you've imported into Ollama.

Requirements

  • A running Ollama server (ollama serve or the Ollama app).
  • Default endpoint is http://localhost:11434. Override via the ollama_url user config in Claude Desktop's extension settings if you run Ollama on a different host or port.
  • No npm dependencies — pure Node over the HTTP API.

Install (Claude Desktop)

  1. Download the latest Ollama.mcpb from the Releases page.
  2. In Claude Desktop: Settings → Extensions → Extension Developer → Install Extension → pick the .mcpb.
  3. (Optional) In the extension's settings, set Ollama server URL if you run Ollama on a non-default host/port. Leave blank for http://localhost:11434.

Tools

Tool Annotation Purpose
ollama_status read-only Health check + server version
list_models read-only Local models with size, digest, family, parameter size, quantization
list_running read-only Models currently loaded in VRAM
show_model read-only Model details: modelfile, parameters, template, capabilities
generate open-world One-shot text completion (non-streaming)
chat open-world Chat completion with message history (non-streaming)
pull_model open-world Download a model from the registry
delete_model destructive Remove a locally-installed model

Example prompts

"Which local models do I have installed, and which one is currently loaded in VRAM?"

"Run forge:b6c1 on this prompt: '<blah>'. Compare that output to your own answer."

"Show me the modelfile for forge:b7c1 — I want to check the temperature setting."

"Pull llama3.1:70b." (expect a long wait for large models)

"Delete the forge:b5c3 model — I don't need that checkpoint anymore."

Privacy policy

This extension runs entirely on your local machine and sends HTTP requests only to your Ollama server (default http://localhost:11434). No data leaves your machine unless you explicitly configure ollama_url to point at a remote Ollama instance, in which case the prompts and responses travel to that server.

The information visible to Claude includes:

  • All prompts and chat messages you pass to generate and chat (these go to the Ollama server, which may log them depending on its configuration).
  • Full text of completions returned by Ollama.
  • Metadata for every installed model (names, digests, sizes, quantization, modelfile contents).
  • Which models are currently loaded in VRAM and their size footprint.

If you have installed models containing proprietary fine-tunes or modelfiles with sensitive metadata, note that Claude will see that information when you call show_model or list_models.

delete_model is destructive and cannot be undone from this extension — the model must be re-pulled from the registry (or re-imported from source blobs) if deleted by mistake.

Troubleshooting

"cannot reach Ollama at http://localhost:11434 — is the server running?" — Start Ollama with ollama serve or launch the Ollama app. Verify with curl http://localhost:11434/ (should return "Ollama is running").

pull_model hangs for a long time — Ollama's pull API with stream: false blocks until the full download completes, which for multi-GB models can take many minutes. If you're pulling a huge model, run ollama pull <name> in a terminal instead — you'll see streaming progress there, and subsequent MCP calls will find the model already installed.

Custom/remote Ollama endpoint — Set ollama_url in the extension's settings (e.g. http://192.168.1.42:11434). Requires restart of the extension.

list_running shows a model after you stopped using it — Ollama keeps models hot in VRAM for a configurable TTL (default 5 minutes). The expires_at timestamp tells you when it'll unload. This is Ollama's behavior, not the extension's.

Development

Single ~400-line Node.js script, zero npm dependencies. Rebuild the .mcpb:

cd bundle-source
zip -j ../Ollama.mcpb manifest.json package.json server.js README.md LICENSE icon.png glama.json

License

MIT. See LICENSE.

Related

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured