pixel-surgeon-mcp
AI image and video generation, editing, and region repair via Gemini, OpenAI, and Grok
README
<p align="center"> <img src="assets/architecture.png" alt="pixel-surgeon-mcp architecture" width="800" /> </p>
<h1 align="center">pixel-surgeon-mcp</h1>
<p align="center"> <strong>MCP server for AI image & video generation, editing, and transplant-grade region repair</strong><br/> Powered by Gemini 3.1 Flash Image, OpenAI GPT Image 2, Grok Imagine, and Veo 3 </p>
<p align="center"> <img src="https://img.shields.io/badge/MCP-stdio-blue" alt="MCP stdio" /> <img src="https://img.shields.io/badge/Gemini_3.1-Flash_Image-4285F4?logo=google" alt="Gemini" /> <img src="https://img.shields.io/badge/GPT_Image_2-OpenAI-412991?logo=openai&logoColor=white" alt="OpenAI" /> <img src="https://img.shields.io/badge/Grok_Imagine-xAI-000000?logo=x&logoColor=white" alt="Grok" /> <img src="https://img.shields.io/badge/Veo_3-Video-34A853?logo=google" alt="Veo 3" /> <img src="https://img.shields.io/badge/TypeScript-5.9-3178C6?logo=typescript&logoColor=white" alt="TypeScript" /> </p>
An MCP server that gives Claude (or any MCP client) the ability to generate images, edit them, fix garbled text, and create videos — all through natural language.
How it works
pixel-surgeon-mcp is a multi-provider image generation server. You can use any combination of providers and switch between them per-request:
Gemini (Google) — balanced
Google's image generation pipeline uses a two-stage approach: Gemini 3.1 Pro reasons about your prompt, then Gemini 3.1 Flash Image renders the pixels. Supports 9 aspect ratios at 512/1K/2K/4K resolution. Best price/performance ratio, with a free tier available.
OpenAI GPT Image 2 — highest quality
OpenAI's latest image model with dramatically improved text rendering and visual fidelity. Supports flexible resolutions — pixel-surgeon maps your chosen size and aspect ratio to the optimal pixel dimensions automatically. Quality levels: medium (fast) and high (print-ready). Excellent for infographics, diagrams, and text-heavy images where other models struggle. Slower and more expensive.
Grok Imagine (xAI) — fastest
xAI's Aurora-powered image model. Fastest generation speed and lowest cost. Supports 7 aspect ratios at fixed resolutions (~1K). Good for rapid prototyping and iteration.
Veo 3 (Video)
For video, the server calls Veo 3 with async polling — generating both video and ambient audio. Supports 16:9 and 9:16 at 5s or 8s duration.
Region repair
AI image models struggle with text-heavy images. The fix tools solve this by sending smaller regions to the provider, then stitching the results back with histogram-matched compositing for seamless blending.
Tools
| Tool | Description |
|---|---|
generate_image |
Text-to-image generation (single image) |
generate_images |
Parallel batch generation (1-8 images) |
generate_video |
Text-to-video via Veo 3 with audio (5s or 8s) |
edit_image |
Edit an existing image with natural language instructions |
fix_image |
Grid-based tile repair for garbled text (2x2, 3x3, etc.) |
fix_region |
Targeted region repair with automatic aspect ratio snapping |
interactive_fix |
Browser-based crop UI with multi-shot selection |
list_images |
List generated images and videos |
save_image |
Import an external image into the workspace |
remove_background |
Remove image background (alpha channel transparency) |
Models
| Model | Provider | Resolution | Best for |
|---|---|---|---|
gemini-3.1-flash-image |
512 / 1K / 2K / 4K | General image generation, photo-realistic scenes | |
gemini-2.5-flash-image |
1K max (free tier) | Quick drafts, prototyping | |
gpt-image-2 |
OpenAI | Flexible (up to 4K) | Text-heavy images, infographics, diagrams, typography |
gpt-image-1 |
OpenAI | 3 fixed sizes | Legacy support |
grok-imagine |
xAI | Fixed (~1K per ratio) | Fast iteration, lowest cost |
Force a specific model per-call via the model tool parameter, or set DEFAULT_IMAGE_MODEL env var.
Gemini automatic fallback
If a Gemini generation call fails with a billing / prepay error, the server automatically retries on the free-tier gemini-2.5-flash-image model. The viewer shows a yellow banner when this happens. Free-tier limits: 1K max resolution, 10 RPM, 500 RPD.
Style presets
All generation and edit tools support an optional style parameter:
neo-brutalist
Magazine editorial, bold typography, halftone textures. Cream, black, and terracotta palette.
<img src="assets/style-neo-brutalist.png" alt="neo-brutalist style example" width="400" />
duval-software-infographic
Duval Software's signature retro-futurist infographic style. 1960s Space Age meets 1980s arcade. Cathode blue, amber, and salmon palette. Great for diagrams and system overviews.
<img src="assets/style-neo-retro-futurism.png" alt="duval-software-infographic style example" width="400" />
fractal-arcade
Dithered fractals, Sierpinski patterns, low-poly. CRT retro, Amiga/EGA palette.
<img src="assets/style-fractal-arcade.png" alt="fractal-arcade style example" width="400" />
clean-tech-infographic
Technical diagrams, system flows, data pipelines. Dark navy, cyan, and electric blue.
<img src="assets/style-clean-tech-infographic.png" alt="clean-tech-infographic style example" width="600" />
Setup
Get your API key(s)
You need at least one provider API key. You can use any combination for maximum flexibility.
Google (Gemini + Veo 3)
- Go to Google AI Studio
- Sign in with your Google account
- Click Create API Key and copy it
Prepayment required. Gemini 3.1 Flash Image and Veo 3 require billing and prepaid credits. The free-tier fallback (2.5 Flash) has limited resolution and rate limits. See Google AI pricing.
OpenAI (GPT Image 2)
- Go to OpenAI API
- Sign in or create an account
- Click Create new secret key and copy it
- Ensure you have API credits — image generation is billed per request
GPT Image 2 excels at text rendering, infographics, and diagrams. If you primarily need text-heavy images, this is the provider to use.
xAI (Grok Imagine)
- Go to xAI Console
- Sign in or create an account
- Create an API key and copy it
Grok Imagine is the fastest and cheapest provider. Great for rapid iteration and prototyping. Fixed output resolutions (~1K) with no size control.
Quick start (npx)
No install needed — run directly with npx. Pass whichever API keys you have:
npx pixel-surgeon-mcp
Claude Code CLI
claude mcp add pixel-surgeon \
-e GOOGLE_API_KEY=your-google-key \
-e OPENAI_API_KEY=your-openai-key \
-e XAI_API_KEY=your-xai-key \
-- npx pixel-surgeon-mcp
Claude Desktop / MCP client config
{
"mcpServers": {
"pixel-surgeon": {
"command": "npx",
"args": ["pixel-surgeon-mcp"],
"env": {
"GOOGLE_API_KEY": "your-google-api-key",
"OPENAI_API_KEY": "your-openai-api-key",
"XAI_API_KEY": "your-xai-api-key"
}
}
}
}
Install from source
If you prefer a local clone:
git clone https://github.com/j-east/pixel-surgeon-mcp.git
cd pixel-surgeon-mcp
npm install
npm run build
Image output
Generated images are saved to ~/Pictures/pixel-surgeon/. A local browser viewer auto-launches on first use for full-resolution previews with model selection, respin controls, and search.
Development
npm run dev # tsx watch mode
npm run build # compile TypeScript
npm run start # run compiled server
Key implementation details
- Aspect ratio snapping — crops are adjusted to the nearest Gemini-supported ratio while preserving center point
- Histogram matching — per-channel RGB normalization ensures composited regions blend seamlessly
- Human-in-the-loop —
interactive_fixopens a browser crop UI, blocks via Promise until the user submits, fires parallel Gemini calls, and lets the user pick the best result - MCP size limits — full-resolution images are saved to disk; downsampled versions (< 950KB) are returned in MCP responses
Contributing
PRs are welcome! We're especially looking for:
New style presets
Add entries to the STYLE_PRESETS object in src/index.ts. Your PR should include:
- The preset definition (name, prompt prefix, default aspect ratio)
- 2-3 example images generated with the preset (drop them in your PR description)
- A short description of the visual style for the README table
Model adapters
The server currently supports Gemini, OpenAI, Grok Imagine, and Veo 3. We'd love adapters for other image/video generation APIs — Stable Diffusion, Flux, etc. If you're interested in adding one, open an issue first so we can align on the interface.
Built by Duval Software
pixel-surgeon-mcp is maintained by John Evans, part of the engineering team at Duval Software — a software engineering firm in Jacksonville Beach, FL building AI-powered tools and custom integrations. If you need MCP servers, AI pipelines, or production tooling built, get in touch.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.