MCP Servers

llm-vision-mcp

An MCP server that enables any LLM to describe images from file paths, URLs, or base64 data by forwarding them to a supported vision provider such as OpenAI, Anthropic, or local Ollama models.

README

llm-vision-mcp

An MCP server that gives vision capabilities to any LLM. It accepts images (file paths, URLs, or base64) and sends them to a vision-capable LLM, returning text descriptions that non-vision LLMs can use.

Providers

Provider	Default Model	Use Case
OpenAI	gpt-4o	General-purpose vision
Anthropic	claude-sonnet-4-latest	Detailed image analysis
Google	gemini-2.0-flash	Fast, cost-effective vision
Ollama	llava	Local/private inference
OpenAI-compatible	User-configured	DeepSeek, Qwen-VL, Together, etc.
Generic HTTP	N/A	Any API with custom request/response mapping

Quick Start

npm install
npm run build

Option 1: CLI arguments (simplest)

node dist/index.js --provider openai --openai-api-key sk-...

Option 2: Environment variables

cp .env.example .env
# Edit .env with your API keys
node dist/index.js

Option 3: Config file (multi-provider)

cp config.example.json vision-config.json
# Edit vision-config.json
VISION_CONFIG_PATH=./vision-config.json node dist/index.js

MCP Client Configuration

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "vision": {
      "command": "node",
      "args": [
        "/absolute/path/to/llm-vision-mcp/dist/index.js",
        "--provider", "openai",
        "--openai-api-key", "sk-..."
      ]
    }
  }
}

Claude Code

Add to your .mcp.json:

{
  "mcpServers": {
    "vision": {
      "command": "node",
      "args": [
        "/absolute/path/to/llm-vision-mcp/dist/index.js",
        "--provider", "openai",
        "--openai-api-key", "sk-..."
      ]
    }
  }
}

Prompt: `vision_instructions`

The server registers an MCP prompt called vision_instructions that teaches the LLM when and how to use the describe_image tool. MCP clients that support prompts can inject this into the LLM's context so it automatically calls the tool whenever it encounters image paths, URLs, or base64 data — rather than guessing what an image contains.

Tool: `describe_image`

Sends an image to a vision LLM and returns a text description.

Parameters

Parameter	Required	Description
`image`	Yes	File path, URL, or base64-encoded image data
`prompt`	No	Custom instruction (default: "Describe this image in detail.")
`provider`	No	Override the default provider
`model`	No	Override the provider's default model

Image Input Formats

File path: /home/user/photo.png or ./images/chart.jpg
URL: https://example.com/image.png
Base64 data URL: data:image/png;base64,iVBOR...
Raw base64: Long base64 string (auto-detected)

Examples

"Describe this screenshot" + image: "/tmp/screenshot.png"
"Extract all text from this image" + image: "https://example.com/document.png"
"What data does this chart show?" + image: "data:image/png;base64,..."

Usage reporting

When the provider returns token counts, a second text content block is appended with Usage: <in> in / <out> out / <total> total tokens. Batch results (see describe_images) also include aggregated totalUsage.

Tool: `describe_images`

Describes multiple images in a single batched call. Each item may override the batch-level prompt, provider, and model. Results come back in input order. Per-provider concurrency limits are honored.

Parameters

Parameter	Required	Description
`items`	Yes	Array of 1–100 items, each with its own `image` and optional `prompt`/`provider`/`model`
`prompt`	No	Default prompt for items without their own
`provider`	No	Default provider for items without their own
`model`	No	Default model for items without their own
`concurrency`	No	Override the per-provider concurrency cap

Example call

{
  "items": [
    { "image": "/tmp/a.png" },
    { "image": "https://example.com/b.png", "prompt": "Extract text" }
  ],
  "prompt": "Describe this image in detail."
}

Sample result

{
  "results": [
    { "index": 0, "text": "A cat sitting on a desk.", "usage": { "inputTokens": 812, "outputTokens": 17, "totalTokens": 829 } },
    { "index": 1, "text": "Invoice header reading 'ACME Corp'." }
  ],
  "totalUsage": { "inputTokens": 812, "outputTokens": 17, "totalTokens": 829 }
}

Failed items appear with an error field instead of text; the batch itself does not fail.

Retry behavior

Transient errors — 429, 5xx, and network failures — are retried up to 3 times with exponential backoff. Configure via the top-level retry block (maxAttempts, baseDelayMs); per-provider retry overrides the global default.

Configuration

Configuration sources are loaded in this order (later overrides earlier):

.env file
Environment variables
CLI arguments
Config file (vision-config.json)
Per-request provider and model parameters

CLI Arguments

--provider <name>              Default provider
--openai-api-key <key>         OpenAI API key
--anthropic-api-key <key>      Anthropic API key
--google-api-key <key>         Google API key
--ollama-base-url <url>        Ollama URL (default: http://localhost:11434)
--ollama-model <model>         Ollama model (default: llava)
--model <model>                Default model for the default provider
--timeout <ms>                 Request timeout for the default provider
--ollama-timeout <ms>          Request timeout for Ollama (default: 120000)
--api-key <key>                API key for the default provider (generic)
--base-url <url>               Base URL for the default provider (generic)
--config <path>                Path to config file

Environment Variables

VISION_DEFAULT_PROVIDER=openai
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llava
VISION_TIMEOUT_MS=60000          # default provider timeout
OLLAMA_TIMEOUT_MS=300000         # bump for slow local models
VISION_CONFIG_PATH=./vision-config.json

# Preset providers — pick ONE vendor and replace the VISION_DEFAULT_PROVIDER
# value above; see "Preset Providers" section for the full list.
# Example (Moonshot):
#   VISION_DEFAULT_PROVIDER=moonshot
#   MOONSHOT_API_KEY=sk-...
#   # Optional: MOONSHOT_MODEL=kimi-k2.6, MOONSHOT_BASE_URL=https://api.moonshot.cn/v1/

Preset Providers

For 8 major OpenAI-compatible vision vendors, llm-vision-mcp ships with built-in preset defaults. Set VISION_DEFAULT_PROVIDER=<name> plus the vendor's standard API key env var — nothing else required. Optionally override the default model and base URL with <VENDOR>_MODEL / <VENDOR>_BASE_URL.

Preset name	Base URL	Default model	API key env var
`moonshot`	`https://api.moonshot.ai/v1/`	`kimi-k2.5`	`MOONSHOT_API_KEY`
`zai`	`https://api.z.ai/api/paas/v4/`	`glm-4.5v`	`ZAI_API_KEY`
`qwen`	`https://dashscope-intl.aliyuncs.com/compatible-mode/v1/`	`qwen3-vl-plus`	`DASHSCOPE_API_KEY`
`nvidia`	`https://integrate.api.nvidia.com/v1/`	`meta/llama-3.2-11b-vision-instruct`	`NVIDIA_API_KEY`
`groq`	`https://api.groq.com/openai/v1/`	`meta-llama/llama-4-scout-17b-16e-instruct`	`GROQ_API_KEY`
`together`	`https://api.together.xyz/v1/`	`meta-llama/Llama-Vision-Free`	`TOGETHER_API_KEY`
`deepinfra`	`https://api.deepinfra.com/v1/openai/`	`meta-llama/Llama-3.2-11B-Vision-Instruct`	`DEEPINFRA_API_KEY`
`xai`	`https://api.x.ai/v1/`	`grok-4.20-0309-non-reasoning`	`XAI_API_KEY`

Model strings use each vendor's exact casing (NVIDIA ships llama-3.2-11b-vision-instruct lowercase while DeepInfra ships Llama-3.2-11B-Vision-Instruct mixed case). Copy them verbatim — do not normalize.

Region notes:

zai — default baseUrl is the international endpoint (api.z.ai). Users in mainland China should override: ZAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4/ (and use their bigmodel.cn-issued key as ZAI_API_KEY).
qwen — default baseUrl is the Singapore international endpoint. Users in mainland China should override: DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1/.

Quickstart with Moonshot:

export VISION_DEFAULT_PROVIDER=moonshot
export MOONSHOT_API_KEY=sk-...
llm-vision-mcp

The same pattern works for all 8 presets — swap moonshot and MOONSHOT_API_KEY for any other row of the table above.

MCP host config example (Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "vision": {
      "command": "node",
      "args": ["/absolute/path/to/llm-vision-mcp/dist/index.js"],
      "env": {
        "VISION_DEFAULT_PROVIDER": "moonshot",
        "MOONSHOT_API_KEY": "sk-..."
      }
    }
  }
}

Need multiple presets active at once, or pinned retry/concurrency settings per preset? See the Provider Cookbook below for copy-paste config-file snippets.

Config File

See config.example.json for a full example with all providers.

The config file supports ${ENV_VAR} interpolation — API keys can reference environment variables so they never appear in the file.

Provider Cookbook

Copy-paste JSON snippets for each preset vendor. Drop into your vision-config.json to pin settings, combine multiple providers, or override preset defaults. Keys stay in env vars via ${ENV_VAR} interpolation.

Moonshot (Kimi)

{
  "defaultProvider": "moonshot",
  "providers": {
    "moonshot": {
      "type": "openai-compatible",
      "baseUrl": "https://api.moonshot.ai/v1/",
      "apiKey": "${MOONSHOT_API_KEY}",
      "model": "kimi-k2.5"
    }
  }
}

Z.ai (Zhipu GLM) — international

{
  "defaultProvider": "zai",
  "providers": {
    "zai": {
      "type": "openai-compatible",
      "baseUrl": "https://api.z.ai/api/paas/v4/",
      "apiKey": "${ZAI_API_KEY}",
      "model": "glm-4.5v"
    }
  }
}

Z.ai (Zhipu GLM) — China region

Same vendor, different endpoint and key:

{
  "defaultProvider": "zai",
  "providers": {
    "zai": {
      "type": "openai-compatible",
      "baseUrl": "https://open.bigmodel.cn/api/paas/v4/",
      "apiKey": "${ZHIPUAI_API_KEY}",
      "model": "glm-4.5v"
    }
  }
}

Qwen (Alibaba DashScope) — international

{
  "defaultProvider": "qwen",
  "providers": {
    "qwen": {
      "type": "openai-compatible",
      "baseUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/",
      "apiKey": "${DASHSCOPE_API_KEY}",
      "model": "qwen3-vl-plus"
    }
  }
}

NVIDIA NIM

{
  "defaultProvider": "nvidia",
  "providers": {
    "nvidia": {
      "type": "openai-compatible",
      "baseUrl": "https://integrate.api.nvidia.com/v1/",
      "apiKey": "${NVIDIA_API_KEY}",
      "model": "meta/llama-3.2-11b-vision-instruct"
    }
  }
}

Groq

{
  "defaultProvider": "groq",
  "providers": {
    "groq": {
      "type": "openai-compatible",
      "baseUrl": "https://api.groq.com/openai/v1/",
      "apiKey": "${GROQ_API_KEY}",
      "model": "meta-llama/llama-4-scout-17b-16e-instruct"
    }
  }
}

Together AI

{
  "defaultProvider": "together",
  "providers": {
    "together": {
      "type": "openai-compatible",
      "baseUrl": "https://api.together.xyz/v1/",
      "apiKey": "${TOGETHER_API_KEY}",
      "model": "meta-llama/Llama-Vision-Free"
    }
  }
}

DeepInfra

{
  "defaultProvider": "deepinfra",
  "providers": {
    "deepinfra": {
      "type": "openai-compatible",
      "baseUrl": "https://api.deepinfra.com/v1/openai/",
      "apiKey": "${DEEPINFRA_API_KEY}",
      "model": "meta-llama/Llama-3.2-11B-Vision-Instruct"
    }
  }
}

xAI (Grok)

{
  "defaultProvider": "xai",
  "providers": {
    "xai": {
      "type": "openai-compatible",
      "baseUrl": "https://api.x.ai/v1/",
      "apiKey": "${XAI_API_KEY}",
      "model": "grok-4.20-0309-non-reasoning"
    }
  }
}

Multiple providers simultaneously

{
  "defaultProvider": "openai",
  "providers": {
    "openai": {
      "apiKey": "${OPENAI_API_KEY}",
      "model": "gpt-4o"
    },
    "moonshot": {
      "type": "openai-compatible",
      "baseUrl": "https://api.moonshot.ai/v1/",
      "apiKey": "${MOONSHOT_API_KEY}",
      "model": "kimi-k2.5"
    },
    "zai": {
      "type": "openai-compatible",
      "baseUrl": "https://api.z.ai/api/paas/v4/",
      "apiKey": "${ZAI_API_KEY}",
      "model": "glm-4.5v"
    }
  }
}

Custom Providers

OpenAI-compatible (DeepSeek, Qwen-VL, etc.)

Most Chinese LLM providers expose an OpenAI-compatible API:

{
  "providers": {
    "deepseek": {
      "type": "openai-compatible",
      "baseUrl": "https://api.deepseek.com/v1",
      "apiKey": "${DEEPSEEK_API_KEY}",
      "model": "deepseek-vl2"
    }
  }
}

Generic HTTP (any API)

For APIs with non-standard request/response formats:

{
  "providers": {
    "custom": {
      "type": "generic-http",
      "url": "https://my-api.example.com/vision",
      "headers": { "Authorization": "Bearer ${API_KEY}" },
      "requestTemplate": {
        "image": "{{image}}",
        "prompt": "{{prompt}}",
        "type": "{{mimeType}}"
      },
      "imageFormat": "base64",
      "responsePath": "result.text"
    }
  }
}

Template placeholders: {{image}}, {{prompt}}, {{mimeType}}

imageFormat: "base64" (raw) or "data-url" (data:image/png;base64,...)

responsePath: Dot-notation path to extract the text from the JSON response (e.g., choices.0.message.content)

usagePath (optional): Dot-notation path to a numeric token total in the response. Reported as totalTokens.

Note: headers values are sent to the server literally — there is no ${ENV_VAR} expansion. Paste the bearer token directly, or launch the server from a wrapper that substitutes it.

Example: MiniMax vision (`MiniMax-M2.7`)

MiniMax's OpenAI-compatible /v1/chat/completions endpoint silently drops image_url content blocks, so vision requests must go to the Anthropic-compatible /anthropic/v1/messages endpoint with the image embedded as a plain-text data URL inside the content string (not as a content-part array).

{
  "providers": {
    "minimax": {
      "type": "generic-http",
      "url": "https://api.minimax.io/anthropic/v1/messages",
      "headers": { "Authorization": "Bearer YOUR_MINIMAX_API_KEY" },
      "requestTemplate": {
        "model": "MiniMax-M2.7",
        "max_tokens": 1024,
        "messages": [
          { "role": "user", "content": "{{image}}\n{{prompt}}" }
        ]
      },
      "imageFormat": "data-url",
      "responsePath": "content.0.text"
    }
  },
  "defaultProvider": "minimax"
}

Verified 2026-04 against MiniMax's international host (api.minimax.io). Chinese users swap the host for api.minimaxi.com. If MiniMax later ships a native image_url-style content part or adds a preset-class adapter in llm-vision-mcp, this generic-http config can be replaced by the simpler preset form.

Image Preprocessing

Images are automatically preprocessed before being sent to providers:

Format conversion: Unsupported formats (e.g., WEBP for providers that don't support it) are converted to PNG
Resizing: Images exceeding 2048x2048 are resized to fit (configurable)
Compression: Images exceeding 20MB are JPEG-compressed at decreasing quality levels

Preprocessing options can be customized in the config file:

{
  "preprocessing": {
    "maxWidth": 2048,
    "maxHeight": 2048,
    "maxFileSizeBytes": 20971520
  }
}

Development

npm test              # Run tests
npm run test:watch    # Watch mode
npm run build         # Compile TypeScript
npm run dev           # Watch mode compilation

License

ISC

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

llm-vision-mcp

README

llm-vision-mcp

Providers

Quick Start

Option 1: CLI arguments (simplest)

Option 2: Environment variables

Option 3: Config file (multi-provider)

MCP Client Configuration

Claude Desktop

Claude Code

Prompt: vision_instructions

Tool: describe_image

Parameters

Image Input Formats

Examples

Usage reporting

Tool: describe_images

Parameters

Example call

Sample result

Retry behavior

Configuration

CLI Arguments

Environment Variables

Preset Providers

Config File

Provider Cookbook

Moonshot (Kimi)

Z.ai (Zhipu GLM) — international

Z.ai (Zhipu GLM) — China region

Qwen (Alibaba DashScope) — international

NVIDIA NIM

Groq

Together AI

DeepInfra

xAI (Grok)

Multiple providers simultaneously

Custom Providers

OpenAI-compatible (DeepSeek, Qwen-VL, etc.)

Generic HTTP (any API)

Example: MiniMax vision (MiniMax-M2.7)

Image Preprocessing

Development

License

Recommended Servers

Prompt: `vision_instructions`

Tool: `describe_image`

Tool: `describe_images`

Example: MiniMax vision (`MiniMax-M2.7`)