llm-vision-mcp
An MCP server that enables any LLM to describe images from file paths, URLs, or base64 data by forwarding them to a supported vision provider such as OpenAI, Anthropic, or local Ollama models.
README
llm-vision-mcp
An MCP server that gives vision capabilities to any LLM. It accepts images (file paths, URLs, or base64) and sends them to a vision-capable LLM, returning text descriptions that non-vision LLMs can use.
Providers
| Provider | Default Model | Use Case |
|---|---|---|
| OpenAI | gpt-4o | General-purpose vision |
| Anthropic | claude-sonnet-4-latest | Detailed image analysis |
| gemini-2.0-flash | Fast, cost-effective vision | |
| Ollama | llava | Local/private inference |
| OpenAI-compatible | User-configured | DeepSeek, Qwen-VL, Together, etc. |
| Generic HTTP | N/A | Any API with custom request/response mapping |
Quick Start
npm install
npm run build
Option 1: CLI arguments (simplest)
node dist/index.js --provider openai --openai-api-key sk-...
Option 2: Environment variables
cp .env.example .env
# Edit .env with your API keys
node dist/index.js
Option 3: Config file (multi-provider)
cp config.example.json vision-config.json
# Edit vision-config.json
VISION_CONFIG_PATH=./vision-config.json node dist/index.js
MCP Client Configuration
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"vision": {
"command": "node",
"args": [
"/absolute/path/to/llm-vision-mcp/dist/index.js",
"--provider", "openai",
"--openai-api-key", "sk-..."
]
}
}
}
Claude Code
Add to your .mcp.json:
{
"mcpServers": {
"vision": {
"command": "node",
"args": [
"/absolute/path/to/llm-vision-mcp/dist/index.js",
"--provider", "openai",
"--openai-api-key", "sk-..."
]
}
}
}
Prompt: vision_instructions
The server registers an MCP prompt called vision_instructions that teaches the LLM when and how to use the describe_image tool. MCP clients that support prompts can inject this into the LLM's context so it automatically calls the tool whenever it encounters image paths, URLs, or base64 data — rather than guessing what an image contains.
Tool: describe_image
Sends an image to a vision LLM and returns a text description.
Parameters
| Parameter | Required | Description |
|---|---|---|
image |
Yes | File path, URL, or base64-encoded image data |
prompt |
No | Custom instruction (default: "Describe this image in detail.") |
provider |
No | Override the default provider |
model |
No | Override the provider's default model |
Image Input Formats
- File path:
/home/user/photo.pngor./images/chart.jpg - URL:
https://example.com/image.png - Base64 data URL:
data:image/png;base64,iVBOR... - Raw base64: Long base64 string (auto-detected)
Examples
"Describe this screenshot" + image: "/tmp/screenshot.png"
"Extract all text from this image" + image: "https://example.com/document.png"
"What data does this chart show?" + image: "data:image/png;base64,..."
Usage reporting
When the provider returns token counts, a second text content block is appended with Usage: <in> in / <out> out / <total> total tokens. Batch results (see describe_images) also include aggregated totalUsage.
Tool: describe_images
Describes multiple images in a single batched call. Each item may override the batch-level prompt, provider, and model. Results come back in input order. Per-provider concurrency limits are honored.
Parameters
| Parameter | Required | Description |
|---|---|---|
items |
Yes | Array of 1–100 items, each with its own image and optional prompt/provider/model |
prompt |
No | Default prompt for items without their own |
provider |
No | Default provider for items without their own |
model |
No | Default model for items without their own |
concurrency |
No | Override the per-provider concurrency cap |
Example call
{
"items": [
{ "image": "/tmp/a.png" },
{ "image": "https://example.com/b.png", "prompt": "Extract text" }
],
"prompt": "Describe this image in detail."
}
Sample result
{
"results": [
{ "index": 0, "text": "A cat sitting on a desk.", "usage": { "inputTokens": 812, "outputTokens": 17, "totalTokens": 829 } },
{ "index": 1, "text": "Invoice header reading 'ACME Corp'." }
],
"totalUsage": { "inputTokens": 812, "outputTokens": 17, "totalTokens": 829 }
}
Failed items appear with an error field instead of text; the batch itself does not fail.
Retry behavior
Transient errors — 429, 5xx, and network failures — are retried up to 3 times with exponential backoff. Configure via the top-level retry block (maxAttempts, baseDelayMs); per-provider retry overrides the global default.
Configuration
Configuration sources are loaded in this order (later overrides earlier):
.envfile- Environment variables
- CLI arguments
- Config file (
vision-config.json) - Per-request
providerandmodelparameters
CLI Arguments
--provider <name> Default provider
--openai-api-key <key> OpenAI API key
--anthropic-api-key <key> Anthropic API key
--google-api-key <key> Google API key
--ollama-base-url <url> Ollama URL (default: http://localhost:11434)
--ollama-model <model> Ollama model (default: llava)
--model <model> Default model for the default provider
--timeout <ms> Request timeout for the default provider
--ollama-timeout <ms> Request timeout for Ollama (default: 120000)
--api-key <key> API key for the default provider (generic)
--base-url <url> Base URL for the default provider (generic)
--config <path> Path to config file
Environment Variables
VISION_DEFAULT_PROVIDER=openai
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llava
VISION_TIMEOUT_MS=60000 # default provider timeout
OLLAMA_TIMEOUT_MS=300000 # bump for slow local models
VISION_CONFIG_PATH=./vision-config.json
# Preset providers — pick ONE vendor and replace the VISION_DEFAULT_PROVIDER
# value above; see "Preset Providers" section for the full list.
# Example (Moonshot):
# VISION_DEFAULT_PROVIDER=moonshot
# MOONSHOT_API_KEY=sk-...
# # Optional: MOONSHOT_MODEL=kimi-k2.6, MOONSHOT_BASE_URL=https://api.moonshot.cn/v1/
Preset Providers
For 8 major OpenAI-compatible vision vendors, llm-vision-mcp ships with built-in preset defaults. Set VISION_DEFAULT_PROVIDER=<name> plus the vendor's standard API key env var — nothing else required. Optionally override the default model and base URL with <VENDOR>_MODEL / <VENDOR>_BASE_URL.
| Preset name | Base URL | Default model | API key env var |
|---|---|---|---|
moonshot |
https://api.moonshot.ai/v1/ |
kimi-k2.5 |
MOONSHOT_API_KEY |
zai |
https://api.z.ai/api/paas/v4/ |
glm-4.5v |
ZAI_API_KEY |
qwen |
https://dashscope-intl.aliyuncs.com/compatible-mode/v1/ |
qwen3-vl-plus |
DASHSCOPE_API_KEY |
nvidia |
https://integrate.api.nvidia.com/v1/ |
meta/llama-3.2-11b-vision-instruct |
NVIDIA_API_KEY |
groq |
https://api.groq.com/openai/v1/ |
meta-llama/llama-4-scout-17b-16e-instruct |
GROQ_API_KEY |
together |
https://api.together.xyz/v1/ |
meta-llama/Llama-Vision-Free |
TOGETHER_API_KEY |
deepinfra |
https://api.deepinfra.com/v1/openai/ |
meta-llama/Llama-3.2-11B-Vision-Instruct |
DEEPINFRA_API_KEY |
xai |
https://api.x.ai/v1/ |
grok-4.20-0309-non-reasoning |
XAI_API_KEY |
Model strings use each vendor's exact casing (NVIDIA ships llama-3.2-11b-vision-instruct lowercase while DeepInfra ships Llama-3.2-11B-Vision-Instruct mixed case). Copy them verbatim — do not normalize.
Region notes:
- zai — default
baseUrlis the international endpoint (api.z.ai). Users in mainland China should override:ZAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4/(and use theirbigmodel.cn-issued key asZAI_API_KEY). - qwen — default
baseUrlis the Singapore international endpoint. Users in mainland China should override:DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1/.
Quickstart with Moonshot:
export VISION_DEFAULT_PROVIDER=moonshot
export MOONSHOT_API_KEY=sk-...
llm-vision-mcp
The same pattern works for all 8 presets — swap moonshot and MOONSHOT_API_KEY for any other row of the table above.
MCP host config example (Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"vision": {
"command": "node",
"args": ["/absolute/path/to/llm-vision-mcp/dist/index.js"],
"env": {
"VISION_DEFAULT_PROVIDER": "moonshot",
"MOONSHOT_API_KEY": "sk-..."
}
}
}
}
Need multiple presets active at once, or pinned retry/concurrency settings per preset? See the Provider Cookbook below for copy-paste config-file snippets.
Config File
See config.example.json for a full example with all providers.
The config file supports ${ENV_VAR} interpolation — API keys can reference environment variables so they never appear in the file.
Provider Cookbook
Copy-paste JSON snippets for each preset vendor. Drop into your vision-config.json to pin settings, combine multiple providers, or override preset defaults. Keys stay in env vars via ${ENV_VAR} interpolation.
Moonshot (Kimi)
{
"defaultProvider": "moonshot",
"providers": {
"moonshot": {
"type": "openai-compatible",
"baseUrl": "https://api.moonshot.ai/v1/",
"apiKey": "${MOONSHOT_API_KEY}",
"model": "kimi-k2.5"
}
}
}
Z.ai (Zhipu GLM) — international
{
"defaultProvider": "zai",
"providers": {
"zai": {
"type": "openai-compatible",
"baseUrl": "https://api.z.ai/api/paas/v4/",
"apiKey": "${ZAI_API_KEY}",
"model": "glm-4.5v"
}
}
}
Z.ai (Zhipu GLM) — China region
Same vendor, different endpoint and key:
{
"defaultProvider": "zai",
"providers": {
"zai": {
"type": "openai-compatible",
"baseUrl": "https://open.bigmodel.cn/api/paas/v4/",
"apiKey": "${ZHIPUAI_API_KEY}",
"model": "glm-4.5v"
}
}
}
Qwen (Alibaba DashScope) — international
{
"defaultProvider": "qwen",
"providers": {
"qwen": {
"type": "openai-compatible",
"baseUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/",
"apiKey": "${DASHSCOPE_API_KEY}",
"model": "qwen3-vl-plus"
}
}
}
NVIDIA NIM
{
"defaultProvider": "nvidia",
"providers": {
"nvidia": {
"type": "openai-compatible",
"baseUrl": "https://integrate.api.nvidia.com/v1/",
"apiKey": "${NVIDIA_API_KEY}",
"model": "meta/llama-3.2-11b-vision-instruct"
}
}
}
Groq
{
"defaultProvider": "groq",
"providers": {
"groq": {
"type": "openai-compatible",
"baseUrl": "https://api.groq.com/openai/v1/",
"apiKey": "${GROQ_API_KEY}",
"model": "meta-llama/llama-4-scout-17b-16e-instruct"
}
}
}
Together AI
{
"defaultProvider": "together",
"providers": {
"together": {
"type": "openai-compatible",
"baseUrl": "https://api.together.xyz/v1/",
"apiKey": "${TOGETHER_API_KEY}",
"model": "meta-llama/Llama-Vision-Free"
}
}
}
DeepInfra
{
"defaultProvider": "deepinfra",
"providers": {
"deepinfra": {
"type": "openai-compatible",
"baseUrl": "https://api.deepinfra.com/v1/openai/",
"apiKey": "${DEEPINFRA_API_KEY}",
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct"
}
}
}
xAI (Grok)
{
"defaultProvider": "xai",
"providers": {
"xai": {
"type": "openai-compatible",
"baseUrl": "https://api.x.ai/v1/",
"apiKey": "${XAI_API_KEY}",
"model": "grok-4.20-0309-non-reasoning"
}
}
}
Multiple providers simultaneously
Register several providers at once, then call any of them per request via the MCP tool's provider parameter:
{
"defaultProvider": "openai",
"providers": {
"openai": {
"apiKey": "${OPENAI_API_KEY}",
"model": "gpt-4o"
},
"moonshot": {
"type": "openai-compatible",
"baseUrl": "https://api.moonshot.ai/v1/",
"apiKey": "${MOONSHOT_API_KEY}",
"model": "kimi-k2.5"
},
"zai": {
"type": "openai-compatible",
"baseUrl": "https://api.z.ai/api/paas/v4/",
"apiKey": "${ZAI_API_KEY}",
"model": "glm-4.5v"
}
}
}
Custom Providers
OpenAI-compatible (DeepSeek, Qwen-VL, etc.)
Most Chinese LLM providers expose an OpenAI-compatible API:
{
"providers": {
"deepseek": {
"type": "openai-compatible",
"baseUrl": "https://api.deepseek.com/v1",
"apiKey": "${DEEPSEEK_API_KEY}",
"model": "deepseek-vl2"
}
}
}
Generic HTTP (any API)
For APIs with non-standard request/response formats:
{
"providers": {
"custom": {
"type": "generic-http",
"url": "https://my-api.example.com/vision",
"headers": { "Authorization": "Bearer ${API_KEY}" },
"requestTemplate": {
"image": "{{image}}",
"prompt": "{{prompt}}",
"type": "{{mimeType}}"
},
"imageFormat": "base64",
"responsePath": "result.text"
}
}
}
Template placeholders: {{image}}, {{prompt}}, {{mimeType}}
imageFormat: "base64" (raw) or "data-url" (data:image/png;base64,...)
responsePath: Dot-notation path to extract the text from the JSON response (e.g., choices.0.message.content)
usagePath (optional): Dot-notation path to a numeric token total in the response. Reported as totalTokens.
Note:
headersvalues are sent to the server literally — there is no${ENV_VAR}expansion. Paste the bearer token directly, or launch the server from a wrapper that substitutes it.
Example: MiniMax vision (MiniMax-M2.7)
MiniMax's OpenAI-compatible /v1/chat/completions endpoint silently drops image_url content blocks, so vision requests must go to the Anthropic-compatible /anthropic/v1/messages endpoint with the image embedded as a plain-text data URL inside the content string (not as a content-part array).
{
"providers": {
"minimax": {
"type": "generic-http",
"url": "https://api.minimax.io/anthropic/v1/messages",
"headers": { "Authorization": "Bearer YOUR_MINIMAX_API_KEY" },
"requestTemplate": {
"model": "MiniMax-M2.7",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "{{image}}\n{{prompt}}" }
]
},
"imageFormat": "data-url",
"responsePath": "content.0.text"
}
},
"defaultProvider": "minimax"
}
Verified 2026-04 against MiniMax's international host (api.minimax.io). Chinese users swap the host for api.minimaxi.com. If MiniMax later ships a native image_url-style content part or adds a preset-class adapter in llm-vision-mcp, this generic-http config can be replaced by the simpler preset form.
Image Preprocessing
Images are automatically preprocessed before being sent to providers:
- Format conversion: Unsupported formats (e.g., WEBP for providers that don't support it) are converted to PNG
- Resizing: Images exceeding 2048x2048 are resized to fit (configurable)
- Compression: Images exceeding 20MB are JPEG-compressed at decreasing quality levels
Preprocessing options can be customized in the config file:
{
"preprocessing": {
"maxWidth": 2048,
"maxHeight": 2048,
"maxFileSizeBytes": 20971520
}
}
Development
npm test # Run tests
npm run test:watch # Watch mode
npm run build # Compile TypeScript
npm run dev # Watch mode compilation
License
ISC
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.