llm-vision-mcp

llm-vision-mcp

An MCP server that enables any LLM to describe images from file paths, URLs, or base64 data by forwarding them to a supported vision provider such as OpenAI, Anthropic, or local Ollama models.

Category
Visit Server

README

llm-vision-mcp

An MCP server that gives vision capabilities to any LLM. It accepts images (file paths, URLs, or base64) and sends them to a vision-capable LLM, returning text descriptions that non-vision LLMs can use.

Providers

Provider Default Model Use Case
OpenAI gpt-4o General-purpose vision
Anthropic claude-sonnet-4-latest Detailed image analysis
Google gemini-2.0-flash Fast, cost-effective vision
Ollama llava Local/private inference
OpenAI-compatible User-configured DeepSeek, Qwen-VL, Together, etc.
Generic HTTP N/A Any API with custom request/response mapping

Quick Start

npm install
npm run build

Option 1: CLI arguments (simplest)

node dist/index.js --provider openai --openai-api-key sk-...

Option 2: Environment variables

cp .env.example .env
# Edit .env with your API keys
node dist/index.js

Option 3: Config file (multi-provider)

cp config.example.json vision-config.json
# Edit vision-config.json
VISION_CONFIG_PATH=./vision-config.json node dist/index.js

MCP Client Configuration

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "vision": {
      "command": "node",
      "args": [
        "/absolute/path/to/llm-vision-mcp/dist/index.js",
        "--provider", "openai",
        "--openai-api-key", "sk-..."
      ]
    }
  }
}

Claude Code

Add to your .mcp.json:

{
  "mcpServers": {
    "vision": {
      "command": "node",
      "args": [
        "/absolute/path/to/llm-vision-mcp/dist/index.js",
        "--provider", "openai",
        "--openai-api-key", "sk-..."
      ]
    }
  }
}

Prompt: vision_instructions

The server registers an MCP prompt called vision_instructions that teaches the LLM when and how to use the describe_image tool. MCP clients that support prompts can inject this into the LLM's context so it automatically calls the tool whenever it encounters image paths, URLs, or base64 data — rather than guessing what an image contains.

Tool: describe_image

Sends an image to a vision LLM and returns a text description.

Parameters

Parameter Required Description
image Yes File path, URL, or base64-encoded image data
prompt No Custom instruction (default: "Describe this image in detail.")
provider No Override the default provider
model No Override the provider's default model

Image Input Formats

  • File path: /home/user/photo.png or ./images/chart.jpg
  • URL: https://example.com/image.png
  • Base64 data URL: data:image/png;base64,iVBOR...
  • Raw base64: Long base64 string (auto-detected)

Examples

"Describe this screenshot" + image: "/tmp/screenshot.png"
"Extract all text from this image" + image: "https://example.com/document.png"
"What data does this chart show?" + image: "data:image/png;base64,..."

Usage reporting

When the provider returns token counts, a second text content block is appended with Usage: <in> in / <out> out / <total> total tokens. Batch results (see describe_images) also include aggregated totalUsage.

Tool: describe_images

Describes multiple images in a single batched call. Each item may override the batch-level prompt, provider, and model. Results come back in input order. Per-provider concurrency limits are honored.

Parameters

Parameter Required Description
items Yes Array of 1–100 items, each with its own image and optional prompt/provider/model
prompt No Default prompt for items without their own
provider No Default provider for items without their own
model No Default model for items without their own
concurrency No Override the per-provider concurrency cap

Example call

{
  "items": [
    { "image": "/tmp/a.png" },
    { "image": "https://example.com/b.png", "prompt": "Extract text" }
  ],
  "prompt": "Describe this image in detail."
}

Sample result

{
  "results": [
    { "index": 0, "text": "A cat sitting on a desk.", "usage": { "inputTokens": 812, "outputTokens": 17, "totalTokens": 829 } },
    { "index": 1, "text": "Invoice header reading 'ACME Corp'." }
  ],
  "totalUsage": { "inputTokens": 812, "outputTokens": 17, "totalTokens": 829 }
}

Failed items appear with an error field instead of text; the batch itself does not fail.

Retry behavior

Transient errors — 429, 5xx, and network failures — are retried up to 3 times with exponential backoff. Configure via the top-level retry block (maxAttempts, baseDelayMs); per-provider retry overrides the global default.

Configuration

Configuration sources are loaded in this order (later overrides earlier):

  1. .env file
  2. Environment variables
  3. CLI arguments
  4. Config file (vision-config.json)
  5. Per-request provider and model parameters

CLI Arguments

--provider <name>              Default provider
--openai-api-key <key>         OpenAI API key
--anthropic-api-key <key>      Anthropic API key
--google-api-key <key>         Google API key
--ollama-base-url <url>        Ollama URL (default: http://localhost:11434)
--ollama-model <model>         Ollama model (default: llava)
--model <model>                Default model for the default provider
--timeout <ms>                 Request timeout for the default provider
--ollama-timeout <ms>          Request timeout for Ollama (default: 120000)
--api-key <key>                API key for the default provider (generic)
--base-url <url>               Base URL for the default provider (generic)
--config <path>                Path to config file

Environment Variables

VISION_DEFAULT_PROVIDER=openai
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llava
VISION_TIMEOUT_MS=60000          # default provider timeout
OLLAMA_TIMEOUT_MS=300000         # bump for slow local models
VISION_CONFIG_PATH=./vision-config.json

# Preset providers — pick ONE vendor and replace the VISION_DEFAULT_PROVIDER
# value above; see "Preset Providers" section for the full list.
# Example (Moonshot):
#   VISION_DEFAULT_PROVIDER=moonshot
#   MOONSHOT_API_KEY=sk-...
#   # Optional: MOONSHOT_MODEL=kimi-k2.6, MOONSHOT_BASE_URL=https://api.moonshot.cn/v1/

Preset Providers

For 8 major OpenAI-compatible vision vendors, llm-vision-mcp ships with built-in preset defaults. Set VISION_DEFAULT_PROVIDER=<name> plus the vendor's standard API key env var — nothing else required. Optionally override the default model and base URL with <VENDOR>_MODEL / <VENDOR>_BASE_URL.

Preset name Base URL Default model API key env var
moonshot https://api.moonshot.ai/v1/ kimi-k2.5 MOONSHOT_API_KEY
zai https://api.z.ai/api/paas/v4/ glm-4.5v ZAI_API_KEY
qwen https://dashscope-intl.aliyuncs.com/compatible-mode/v1/ qwen3-vl-plus DASHSCOPE_API_KEY
nvidia https://integrate.api.nvidia.com/v1/ meta/llama-3.2-11b-vision-instruct NVIDIA_API_KEY
groq https://api.groq.com/openai/v1/ meta-llama/llama-4-scout-17b-16e-instruct GROQ_API_KEY
together https://api.together.xyz/v1/ meta-llama/Llama-Vision-Free TOGETHER_API_KEY
deepinfra https://api.deepinfra.com/v1/openai/ meta-llama/Llama-3.2-11B-Vision-Instruct DEEPINFRA_API_KEY
xai https://api.x.ai/v1/ grok-4.20-0309-non-reasoning XAI_API_KEY

Model strings use each vendor's exact casing (NVIDIA ships llama-3.2-11b-vision-instruct lowercase while DeepInfra ships Llama-3.2-11B-Vision-Instruct mixed case). Copy them verbatim — do not normalize.

Region notes:

  • zai — default baseUrl is the international endpoint (api.z.ai). Users in mainland China should override: ZAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4/ (and use their bigmodel.cn-issued key as ZAI_API_KEY).
  • qwen — default baseUrl is the Singapore international endpoint. Users in mainland China should override: DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1/.

Quickstart with Moonshot:

export VISION_DEFAULT_PROVIDER=moonshot
export MOONSHOT_API_KEY=sk-...
llm-vision-mcp

The same pattern works for all 8 presets — swap moonshot and MOONSHOT_API_KEY for any other row of the table above.

MCP host config example (Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "vision": {
      "command": "node",
      "args": ["/absolute/path/to/llm-vision-mcp/dist/index.js"],
      "env": {
        "VISION_DEFAULT_PROVIDER": "moonshot",
        "MOONSHOT_API_KEY": "sk-..."
      }
    }
  }
}

Need multiple presets active at once, or pinned retry/concurrency settings per preset? See the Provider Cookbook below for copy-paste config-file snippets.

Config File

See config.example.json for a full example with all providers.

The config file supports ${ENV_VAR} interpolation — API keys can reference environment variables so they never appear in the file.

Provider Cookbook

Copy-paste JSON snippets for each preset vendor. Drop into your vision-config.json to pin settings, combine multiple providers, or override preset defaults. Keys stay in env vars via ${ENV_VAR} interpolation.

Moonshot (Kimi)

{
  "defaultProvider": "moonshot",
  "providers": {
    "moonshot": {
      "type": "openai-compatible",
      "baseUrl": "https://api.moonshot.ai/v1/",
      "apiKey": "${MOONSHOT_API_KEY}",
      "model": "kimi-k2.5"
    }
  }
}

Z.ai (Zhipu GLM) — international

{
  "defaultProvider": "zai",
  "providers": {
    "zai": {
      "type": "openai-compatible",
      "baseUrl": "https://api.z.ai/api/paas/v4/",
      "apiKey": "${ZAI_API_KEY}",
      "model": "glm-4.5v"
    }
  }
}

Z.ai (Zhipu GLM) — China region

Same vendor, different endpoint and key:

{
  "defaultProvider": "zai",
  "providers": {
    "zai": {
      "type": "openai-compatible",
      "baseUrl": "https://open.bigmodel.cn/api/paas/v4/",
      "apiKey": "${ZHIPUAI_API_KEY}",
      "model": "glm-4.5v"
    }
  }
}

Qwen (Alibaba DashScope) — international

{
  "defaultProvider": "qwen",
  "providers": {
    "qwen": {
      "type": "openai-compatible",
      "baseUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/",
      "apiKey": "${DASHSCOPE_API_KEY}",
      "model": "qwen3-vl-plus"
    }
  }
}

NVIDIA NIM

{
  "defaultProvider": "nvidia",
  "providers": {
    "nvidia": {
      "type": "openai-compatible",
      "baseUrl": "https://integrate.api.nvidia.com/v1/",
      "apiKey": "${NVIDIA_API_KEY}",
      "model": "meta/llama-3.2-11b-vision-instruct"
    }
  }
}

Groq

{
  "defaultProvider": "groq",
  "providers": {
    "groq": {
      "type": "openai-compatible",
      "baseUrl": "https://api.groq.com/openai/v1/",
      "apiKey": "${GROQ_API_KEY}",
      "model": "meta-llama/llama-4-scout-17b-16e-instruct"
    }
  }
}

Together AI

{
  "defaultProvider": "together",
  "providers": {
    "together": {
      "type": "openai-compatible",
      "baseUrl": "https://api.together.xyz/v1/",
      "apiKey": "${TOGETHER_API_KEY}",
      "model": "meta-llama/Llama-Vision-Free"
    }
  }
}

DeepInfra

{
  "defaultProvider": "deepinfra",
  "providers": {
    "deepinfra": {
      "type": "openai-compatible",
      "baseUrl": "https://api.deepinfra.com/v1/openai/",
      "apiKey": "${DEEPINFRA_API_KEY}",
      "model": "meta-llama/Llama-3.2-11B-Vision-Instruct"
    }
  }
}

xAI (Grok)

{
  "defaultProvider": "xai",
  "providers": {
    "xai": {
      "type": "openai-compatible",
      "baseUrl": "https://api.x.ai/v1/",
      "apiKey": "${XAI_API_KEY}",
      "model": "grok-4.20-0309-non-reasoning"
    }
  }
}

Multiple providers simultaneously

Register several providers at once, then call any of them per request via the MCP tool's provider parameter:

{
  "defaultProvider": "openai",
  "providers": {
    "openai": {
      "apiKey": "${OPENAI_API_KEY}",
      "model": "gpt-4o"
    },
    "moonshot": {
      "type": "openai-compatible",
      "baseUrl": "https://api.moonshot.ai/v1/",
      "apiKey": "${MOONSHOT_API_KEY}",
      "model": "kimi-k2.5"
    },
    "zai": {
      "type": "openai-compatible",
      "baseUrl": "https://api.z.ai/api/paas/v4/",
      "apiKey": "${ZAI_API_KEY}",
      "model": "glm-4.5v"
    }
  }
}

Custom Providers

OpenAI-compatible (DeepSeek, Qwen-VL, etc.)

Most Chinese LLM providers expose an OpenAI-compatible API:

{
  "providers": {
    "deepseek": {
      "type": "openai-compatible",
      "baseUrl": "https://api.deepseek.com/v1",
      "apiKey": "${DEEPSEEK_API_KEY}",
      "model": "deepseek-vl2"
    }
  }
}

Generic HTTP (any API)

For APIs with non-standard request/response formats:

{
  "providers": {
    "custom": {
      "type": "generic-http",
      "url": "https://my-api.example.com/vision",
      "headers": { "Authorization": "Bearer ${API_KEY}" },
      "requestTemplate": {
        "image": "{{image}}",
        "prompt": "{{prompt}}",
        "type": "{{mimeType}}"
      },
      "imageFormat": "base64",
      "responsePath": "result.text"
    }
  }
}

Template placeholders: {{image}}, {{prompt}}, {{mimeType}}

imageFormat: "base64" (raw) or "data-url" (data:image/png;base64,...)

responsePath: Dot-notation path to extract the text from the JSON response (e.g., choices.0.message.content)

usagePath (optional): Dot-notation path to a numeric token total in the response. Reported as totalTokens.

Note: headers values are sent to the server literally — there is no ${ENV_VAR} expansion. Paste the bearer token directly, or launch the server from a wrapper that substitutes it.

Example: MiniMax vision (MiniMax-M2.7)

MiniMax's OpenAI-compatible /v1/chat/completions endpoint silently drops image_url content blocks, so vision requests must go to the Anthropic-compatible /anthropic/v1/messages endpoint with the image embedded as a plain-text data URL inside the content string (not as a content-part array).

{
  "providers": {
    "minimax": {
      "type": "generic-http",
      "url": "https://api.minimax.io/anthropic/v1/messages",
      "headers": { "Authorization": "Bearer YOUR_MINIMAX_API_KEY" },
      "requestTemplate": {
        "model": "MiniMax-M2.7",
        "max_tokens": 1024,
        "messages": [
          { "role": "user", "content": "{{image}}\n{{prompt}}" }
        ]
      },
      "imageFormat": "data-url",
      "responsePath": "content.0.text"
    }
  },
  "defaultProvider": "minimax"
}

Verified 2026-04 against MiniMax's international host (api.minimax.io). Chinese users swap the host for api.minimaxi.com. If MiniMax later ships a native image_url-style content part or adds a preset-class adapter in llm-vision-mcp, this generic-http config can be replaced by the simpler preset form.

Image Preprocessing

Images are automatically preprocessed before being sent to providers:

  • Format conversion: Unsupported formats (e.g., WEBP for providers that don't support it) are converted to PNG
  • Resizing: Images exceeding 2048x2048 are resized to fit (configurable)
  • Compression: Images exceeding 20MB are JPEG-compressed at decreasing quality levels

Preprocessing options can be customized in the config file:

{
  "preprocessing": {
    "maxWidth": 2048,
    "maxHeight": 2048,
    "maxFileSizeBytes": 20971520
  }
}

Development

npm test              # Run tests
npm run test:watch    # Watch mode
npm run build         # Compile TypeScript
npm run dev           # Watch mode compilation

License

ISC

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured