Vision MCP Server
Provides free and unlimited vision capabilities for AI coding assistants using the Groq API and Meta Llama 4 Vision model. It enables users to perform image analysis, OCR, UI layout description, and error diagnosis directly from screenshots and documents.
README
Vision MCP Server
Free, unlimited vision capabilities for your AI coding assistant using Groq API and Meta Llama 4 Vision model.
Features
- Image Analysis - Understand and describe images
- Text Extraction (OCR) - Extract text from screenshots, documents, photos
- UI Analysis - Describe UI components, layouts, and design
- Error Diagnosis - Analyze error screenshots and suggest fixes
- Diagram Understanding - Interpret flowcharts, UML, architecture diagrams
- Chart Analysis - Read charts and dashboards for insights
- Image Comparison - Compare two images for differences
- Code Extraction - Extract code from IDE screenshots
Installation
Prerequisites
- Python 3.10 or higher
- Free Groq API key
Get Groq API Key (Free)
- Visit https://console.groq.com/keys
- Sign up (free)
- Create a new API key
Install Dependencies
cd vision-mcp-server
# Option 1: Using install script (recommended)
./install.sh
# Option 2: Manual installation
pip3 install mcp groq pillow aiofiles
Configuration
Claude Desktop
Add to ~/.claude/config.json:
{
"mcpServers": {
"vision-mcp-server": {
"command": "python",
"args": ["-m", "vision_mcp_server.server"],
"env": {
"GROQ_API_KEY": "your-groq-api-key-here"
}
}
}
}
OpenCode
Add to OpenCode settings:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"vision-mcp-server": {
"type": "local",
"command": ["python", "-m", "vision_mcp_server.server"],
"environment": {
"GROQ_API_KEY": "your-groq-api-key-here"
}
}
}
}
Cline (VS Code)
Add to Cline settings:
{
"mcpServers": {
"vision-mcp-server": {
"command": "python",
"args": ["-m", "vision_mcp_server.server"],
"env": {
"GROQ_API_KEY": "your-groq-api-key-here"
}
}
}
}
Usage
Analyze Image
Describe this image: screenshot.png
Extract Text
Extract text from this document: scan.jpg
Diagnose Error
What's wrong with this error screenshot: error.png
Understand Diagram
Explain this architecture diagram: system-diagram.png
Compare Images
Compare these two UI screenshots: old-ui.png vs new-ui.png
Available Tools
analyze_image- General image analysisextract_text- OCR text extractiondescribe_ui- UI component analysisdiagnose_error- Error screenshot analysisunderstand_diagram- Diagram interpretationanalyze_chart- Chart and dashboard analysiscompare_images- Image comparisoncode_from_screenshot- Code extraction from screenshots
Models Used
- meta-llama/llama-4-scout-17b-16e-instruct - Latest Meta Llama 4 vision model
- Available for free via Groq API
- No quotas, no limits
- Superior vision capabilities and multimodal performance
Testing
Run locally:
export GROQ_API_KEY=your-api-key
python -m vision_mcp_server.server
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.