Gemini Image MCP Server
Provides image generation, modification, and analysis capabilities using Google's Gemini API, enabling AI-powered image operations through natural language.
README
Gemini Image MCP Server
A Model Context Protocol (MCP) server that provides image generation and manipulation capabilities using Google's Gemini API. This server integrates with Claude Desktop and other MCP-compatible clients to enable AI-powered image operations.
Features
- Image Generation: Create images from text prompts using Gemini 2.5 Flash Image Preview
- Image Modification: Modify existing images with natural language instructions
- Image Analysis: Analyze images for objects, text, colors, emotions, and comprehensive insights
- Batch Generation: Generate multiple images from different prompts in one operation
- Style Transfer: Apply artistic styles to existing images
- Rate Limiting: Built-in rate limiting to respect API quotas
- Safety Settings: Configurable content safety levels
Available Tools
1. generateImage
Generate images from text prompts with customizable options.
Parameters:
prompt(required): Text description of the image to generatewidth(optional): Image width in pixelsheight(optional): Image height in pixelsaspectRatio(optional): One of1:1,16:9,9:16,4:3,3:4style(optional): One ofrealistic,artistic,cartoon,sketch,watercolor,oil-paintingquality(optional): One ofstandard,high,ultranumberOfImages(optional): Number of images to generate (default: 1)
2. modifyImage
Modify existing images using natural language instructions.
Parameters:
imageBase64(required): Base64 encoded image datainstructions(required): Text instructions for modificationpreserveStyle(optional): Whether to preserve original artistic stylestrength(optional): Modification strength from 0 to 1
3. analyzeImage
Analyze images and extract various types of information.
Parameters:
imageBase64(required): Base64 encoded image dataanalysisType(optional): One ofdescription,objects,text,colors,emotions,comprehensivedetail(optional): Analysis detail level -low,medium,high
4. batchGenerate
Generate multiple images from different prompts efficiently.
Parameters:
prompts(required): Array of text promptsbaseOptions(optional): Shared options to apply to all generations
5. applyStyleTransfer
Apply artistic styles to existing images.
Parameters:
imageBase64(required): Base64 encoded image datastyle(required): One ofanime,renaissance,impressionist,cyberpunk,minimalist,vintage,futuristicintensity(optional): Style intensity from 0 to 100
Installation
Prerequisites
- Node.js 18 or higher
- Google Gemini API key (Get one here)
Setup
- Clone or download the project:
git clone <repository-url>
cd gemini-image-mcp
- Install dependencies:
npm install
- Create environment configuration:
cp .env.example .env
- Edit
.envand add your Gemini API key:
GEMINI_API_KEY=your-gemini-api-key-here
- Build the project:
npm run build
Usage
With Claude Desktop
Add the server to your Claude Desktop configuration file:
On macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"gemini-image": {
"command": "node",
"args": ["/path/to/gemini-image-mcp/dist/index.js"],
"env": {
"GEMINI_API_KEY": "your-gemini-api-key-here"
}
}
}
}
Standalone Usage
You can also run the server directly for testing:
npm start
Configuration Options
Environment variables you can set:
GEMINI_API_KEY(required): Your Google Gemini API keyGEMINI_MODEL(optional): Model to use (default:gemini-2.5-flash-image-preview)SAFETY_LEVEL(optional): Content safety level -LOW,MEDIUM,HIGH,BLOCK_NONE(default:MEDIUM)MAX_REQUESTS_PER_MINUTE(optional): Rate limit (default: 10)
Examples
Once integrated with Claude Desktop, you can use natural language to interact with the tools:
Image Generation
"Generate an image of a sunset over mountains in watercolor style"
Image Modification
"Take this image and add a rainbow in the sky while preserving the original style"
Image Analysis
"Analyze this image and tell me what objects you can detect with confidence scores"
Batch Generation
"Generate 3 different versions of a futuristic cityscape: one cyberpunk style, one minimalist, and one realistic"
Style Transfer
"Apply an impressionist style to this photograph with high intensity"
Development
Running in Development Mode
npm run dev
Building
npm run build
Type Checking
npm run typecheck
Linting
npm run lint
API Limitations
- Rate limiting is enforced based on your configuration
- Image generation may take 10-30 seconds depending on complexity
- Maximum image size depends on Gemini API limits
- Content safety filters are applied based on your safety level setting
Troubleshooting
Common Issues
-
"GEMINI_API_KEY environment variable is required"
- Ensure you've set the API key in your environment or Claude Desktop config
-
"Rate limit exceeded"
- Wait for the rate limit window to reset or adjust
MAX_REQUESTS_PER_MINUTE
- Wait for the rate limit window to reset or adjust
-
"Image generation failed"
- Check your prompt for potentially unsafe content
- Verify your API key has proper permissions
- Try adjusting the safety level settings
Debug Logging
The server logs errors to stderr. Check the Claude Desktop console or your terminal for detailed error messages.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
License
MIT License - see LICENSE file for details
Support
For issues and questions:
- Check the troubleshooting section above
- Review Claude Desktop MCP documentation
- Create an issue in the repository
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.