Gemini Image Generation MCP Server
Enables image generation, editing, and refinement using Google's Gemini 2.5 Flash Image model with support for multi-image composition and style transfer.
README
Gemini Image Generation MCP Server
A Model Context Protocol (MCP) Server that interfaces with Google's Gemini API for image generation using the Gemini 2.5 Flash Image model.
Features
- Generate images using Gemini's 2.5 Flash Image model with various customization options
- Edit and refine images using multimodal capabilities
- Multi-image composition and style transfer
- Conversational image refinement
- Save generated images to local storage in specified folders
- Support for multiple image formats (PNG, JPEG, WebP)
- Include SynthID watermarking for AI-generated images
Installation
Install uv
On Mac you can install it using homebrew
brew install uv
Getting a Gemini API Key
- Navigate to Google AI Studio
- Sign in with your Google Account
- Look for "Get API key" or navigate to the API key management section
- Follow the prompts to create a new key
- Google AI Studio will generate a unique string of characters – this is your API key
Usage with Claude Code
Configure Claude Code to use this MCP server by updating your .cursor/mcp.json or other agent configuration:
{
"mcpServers": {
"gemini-image": {
"command": "uv",
"args": [
"--directory",
"/Users/marabian/mcp-servers/gemini-imagegen-mcp",
"run",
"mcp",
"run",
"main.py"
],
"env": {
"GEMINI_API_KEY": "your_api_key_here",
"DEFAULT_SAVE_DIR": "/path/to/default/save/directory"
}
}
}
}
Important Configuration Notes:
-
Replace
your_api_key_herewith your actual Gemini API key. -
For the
DEFAULT_SAVE_DIR:- Set this to a directory where you want to save all generated images
- You can use a relative path like
./imageswithin your project - For projects, consider using a path like
${PROJECT_ROOT}/generated-images - Defaults to
./generated-imagesin the MCP server directory if not specified
-
When working in different projects:
- The agent will save images to
DEFAULT_SAVE_DIRby default - You can override this within each tool call using the
save_dirparameter - Images will be organized with unique timestamps to prevent conflicts
- The agent will save images to
Available Tools
The Gemini Image Generation MCP Server provides the following tools:
Image Generation
generate_image- Generate an image based on a text promptedit_image- Edit existing images using multimodal promptsrefine_image- Refine an existing image with conversational instructionsget_available_models- List available Gemini image generation models
Image Management
list_saved_images- List images saved in the specified directoryset_save_directory- Set the directory where generated images will be saved
Tool Parameters
generate_image
prompt(required): The text prompt describing the image to generatemodel: Model to use (default: "gemini-2.5-flash-image-preview")temperature: Controls randomness (0.0-1.0, default: 0.7)top_p: Controls nucleus sampling (0.0-1.0, default: 0.95)top_k: Controls top-k sampling (default: 40)max_output_tokens: Maximum tokens to generate (optional)save_dir: Directory to save images (optional, uses default if not specified)filename: Custom filename (optional, timestamp-based if not specified)include_text_response: Include text alongside images (default: true)
edit_image
prompt(required): Instructions for editing the imageimage_paths(required): List of image file paths to edit (up to 3 recommended)model: Model to use (default: "gemini-2.5-flash-image-preview")temperature: Controls randomness (0.0-1.0, default: 0.7)top_p: Controls nucleus sampling (0.0-1.0, default: 0.95)top_k: Controls top-k sampling (default: 40)max_output_tokens: Maximum tokens to generate (optional)save_dir: Directory to save images (optional, uses default if not specified)filename: Custom filename (optional, timestamp-based if not specified)include_text_response: Include text alongside images (default: true)
refine_image
prompt(required): Original prompt used to generate the imageprevious_image_path(required): Path to the image to refinerefinement_instruction(required): Instructions for refinementmodel: Model to use (default: "gemini-2.5-flash-image-preview")temperature: Controls randomness (0.0-1.0, default: 0.7)save_dir: Directory to save images (optional, uses default if not specified)filename: Custom filename (optional, timestamp-based if not specified)
Examples
Generating a Basic Image
# Generate a landscape image
generate_image(
prompt="A serene mountain landscape at sunset with a lake reflection",
temperature=0.8,
save_dir="/path/to/project/images",
filename="mountain_sunset"
)
Editing an Existing Image
# Edit an image to add elements
edit_image(
prompt="Add a small wooden boat on the lake",
image_paths=["/path/to/mountain_sunset.png"],
temperature=0.7,
save_dir="/path/to/project/edits",
filename="mountain_with_boat"
)
Multi-Image Composition
# Combine multiple images into one scene
edit_image(
prompt="Create a cohesive fantasy scene combining these elements",
image_paths=[
"/path/to/dragon.png",
"/path/to/castle.png",
"/path/to/forest.png"
],
temperature=0.6,
filename="fantasy_scene"
)
Conversational Refinement
# Refine an existing image with specific instructions
refine_image(
prompt="A modern office workspace",
previous_image_path="/path/to/office.png",
refinement_instruction="Make the lighting warmer and add some plants",
filename="office_warmer"
)
Advanced Generation with Style Control
# Generate with specific artistic style
generate_image(
prompt="A portrait of a wise old wizard in the style of Renaissance paintings, oil on canvas, dramatic lighting, detailed brushwork",
temperature=0.5,
top_p=0.9,
max_output_tokens=2000,
filename="renaissance_wizard"
)
Model Features
Gemini 2.5 Flash Image
- Text-to-Image Generation: Create images from detailed text descriptions
- Image Editing: Modify existing images with natural language instructions
- Multi-Image Composition: Combine multiple input images into new scenes
- Style Transfer: Apply artistic styles from reference images
- Conversational Refinement: Iteratively improve images through dialogue
- World Knowledge: Leverages Gemini's understanding for contextually accurate images
- Character Consistency: Maintain consistent appearance across multiple generations
- SynthID Watermarking: All generated images include invisible AI identification
Pricing
- Image Generation: $30 per 1 million output tokens
- Tokens per Image: 1,290 tokens
- Cost per Image: ~$0.039
Best Practices
Prompt Writing
- Be Descriptive: Use detailed, narrative descriptions rather than keyword lists
- Include Photography Terms: For realistic images, mention camera angles, lens types, lighting
- Specify Style: Clearly indicate artistic style, medium, or technique desired
- Provide Context: Include setting, mood, and atmospheric details
Image Editing
- Start with Quality Input: Use high-resolution, clear input images
- Limit Input Images: Use up to 3 input images for optimal results
- Be Specific: Provide clear instructions about what to change or add
- Iterate Gradually: Make incremental changes for better control
Performance Optimization
- Use Appropriate Temperature: Lower values (0.3-0.5) for precise results, higher (0.7-1.0) for creativity
- Control Token Usage: Set
max_output_tokenswhen appropriate - Batch Related Tasks: Generate multiple variations in sequence for consistency
Troubleshooting
Common Issues
- API Key Errors: Ensure your Gemini API key is valid and has image generation permissions
- File Not Found: Check that image paths exist and are accessible
- Large File Sizes: Generated images may be large; ensure sufficient disk space
- Rate Limits: Gemini API has usage limits; implement appropriate delays between requests
Error Messages
GEMINI_API_KEY environment variable not set: Set your API key in the environmentImage file does not exist: Verify the provided image path is correctInvalid model: Use supported model names like "gemini-2.5-flash-image-preview"
Limitations
- Input Image Limit: Best results with up to 3 input images
- Supported Languages: Optimized for EN, es-MX, ja-JP, zh-CN, hi-IN
- File Formats: Supports PNG, JPEG, WebP for both input and output
- Content Policy: Subject to Google's AI content policies
Contributing
This MCP server is based on the Model Context Protocol. To contribute:
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
License
This project follows the same license terms as the Model Context Protocol.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.