MCP Servers

Gemini Image Generation MCP Server

Enables image generation, editing, and refinement using Google's Gemini 2.5 Flash Image model with support for multi-image composition and style transfer.

README

Gemini Image Generation MCP Server

A Model Context Protocol (MCP) Server that interfaces with Google's Gemini API for image generation using the Gemini 2.5 Flash Image model.

Features

Generate images using Gemini's 2.5 Flash Image model with various customization options
Edit and refine images using multimodal capabilities
Multi-image composition and style transfer
Conversational image refinement
Save generated images to local storage in specified folders
Support for multiple image formats (PNG, JPEG, WebP)
Include SynthID watermarking for AI-generated images

Installation

Install uv

On Mac you can install it using homebrew

brew install uv

Getting a Gemini API Key

Navigate to Google AI Studio
Sign in with your Google Account
Look for "Get API key" or navigate to the API key management section
Follow the prompts to create a new key
Google AI Studio will generate a unique string of characters – this is your API key

Usage with Claude Code

Configure Claude Code to use this MCP server by updating your .cursor/mcp.json or other agent configuration:

{
  "mcpServers": {
    "gemini-image": {
      "command": "uv",
      "args": [
        "--directory",
        "/Users/marabian/mcp-servers/gemini-imagegen-mcp",
        "run",
        "mcp",
        "run",
        "main.py"
      ],
      "env": {
        "GEMINI_API_KEY": "your_api_key_here",
        "DEFAULT_SAVE_DIR": "/path/to/default/save/directory"
      }
    }
  }
}

Important Configuration Notes:

Replace your_api_key_here with your actual Gemini API key.
For the DEFAULT_SAVE_DIR:
- Set this to a directory where you want to save all generated images
- You can use a relative path like ./images within your project
- For projects, consider using a path like ${PROJECT_ROOT}/generated-images
- Defaults to ./generated-images in the MCP server directory if not specified
When working in different projects:
- The agent will save images to DEFAULT_SAVE_DIR by default
- You can override this within each tool call using the save_dir parameter
- Images will be organized with unique timestamps to prevent conflicts

Available Tools

The Gemini Image Generation MCP Server provides the following tools:

Image Generation

generate_image - Generate an image based on a text prompt
edit_image - Edit existing images using multimodal prompts
refine_image - Refine an existing image with conversational instructions
get_available_models - List available Gemini image generation models

Image Management

list_saved_images - List images saved in the specified directory
set_save_directory - Set the directory where generated images will be saved

Tool Parameters

generate_image

prompt (required): The text prompt describing the image to generate
model: Model to use (default: "gemini-2.5-flash-image-preview")
temperature: Controls randomness (0.0-1.0, default: 0.7)
top_p: Controls nucleus sampling (0.0-1.0, default: 0.95)
top_k: Controls top-k sampling (default: 40)
max_output_tokens: Maximum tokens to generate (optional)
save_dir: Directory to save images (optional, uses default if not specified)
filename: Custom filename (optional, timestamp-based if not specified)
include_text_response: Include text alongside images (default: true)

edit_image

prompt (required): Instructions for editing the image
image_paths (required): List of image file paths to edit (up to 3 recommended)
model: Model to use (default: "gemini-2.5-flash-image-preview")
temperature: Controls randomness (0.0-1.0, default: 0.7)
top_p: Controls nucleus sampling (0.0-1.0, default: 0.95)
top_k: Controls top-k sampling (default: 40)
max_output_tokens: Maximum tokens to generate (optional)
save_dir: Directory to save images (optional, uses default if not specified)
filename: Custom filename (optional, timestamp-based if not specified)
include_text_response: Include text alongside images (default: true)

refine_image

prompt (required): Original prompt used to generate the image
previous_image_path (required): Path to the image to refine
refinement_instruction (required): Instructions for refinement
model: Model to use (default: "gemini-2.5-flash-image-preview")
temperature: Controls randomness (0.0-1.0, default: 0.7)
save_dir: Directory to save images (optional, uses default if not specified)
filename: Custom filename (optional, timestamp-based if not specified)

Examples

Generating a Basic Image

# Generate a landscape image
generate_image(
    prompt="A serene mountain landscape at sunset with a lake reflection",
    temperature=0.8,
    save_dir="/path/to/project/images",
    filename="mountain_sunset"
)

Editing an Existing Image

# Edit an image to add elements
edit_image(
    prompt="Add a small wooden boat on the lake",
    image_paths=["/path/to/mountain_sunset.png"],
    temperature=0.7,
    save_dir="/path/to/project/edits",
    filename="mountain_with_boat"
)

Multi-Image Composition

# Combine multiple images into one scene
edit_image(
    prompt="Create a cohesive fantasy scene combining these elements",
    image_paths=[
        "/path/to/dragon.png", 
        "/path/to/castle.png", 
        "/path/to/forest.png"
    ],
    temperature=0.6,
    filename="fantasy_scene"
)

Conversational Refinement

# Refine an existing image with specific instructions
refine_image(
    prompt="A modern office workspace",
    previous_image_path="/path/to/office.png",
    refinement_instruction="Make the lighting warmer and add some plants",
    filename="office_warmer"
)

Advanced Generation with Style Control

# Generate with specific artistic style
generate_image(
    prompt="A portrait of a wise old wizard in the style of Renaissance paintings, oil on canvas, dramatic lighting, detailed brushwork",
    temperature=0.5,
    top_p=0.9,
    max_output_tokens=2000,
    filename="renaissance_wizard"
)

Model Features

Gemini 2.5 Flash Image

Text-to-Image Generation: Create images from detailed text descriptions
Image Editing: Modify existing images with natural language instructions
Multi-Image Composition: Combine multiple input images into new scenes
Style Transfer: Apply artistic styles from reference images
Conversational Refinement: Iteratively improve images through dialogue
World Knowledge: Leverages Gemini's understanding for contextually accurate images
Character Consistency: Maintain consistent appearance across multiple generations
SynthID Watermarking: All generated images include invisible AI identification

Pricing

Image Generation: $30 per 1 million output tokens
Tokens per Image: 1,290 tokens
Cost per Image: ~$0.039

Best Practices

Prompt Writing

Be Descriptive: Use detailed, narrative descriptions rather than keyword lists
Include Photography Terms: For realistic images, mention camera angles, lens types, lighting
Specify Style: Clearly indicate artistic style, medium, or technique desired
Provide Context: Include setting, mood, and atmospheric details

Image Editing

Start with Quality Input: Use high-resolution, clear input images
Limit Input Images: Use up to 3 input images for optimal results
Be Specific: Provide clear instructions about what to change or add
Iterate Gradually: Make incremental changes for better control

Performance Optimization

Use Appropriate Temperature: Lower values (0.3-0.5) for precise results, higher (0.7-1.0) for creativity
Control Token Usage: Set max_output_tokens when appropriate
Batch Related Tasks: Generate multiple variations in sequence for consistency

Troubleshooting

Common Issues

API Key Errors: Ensure your Gemini API key is valid and has image generation permissions
File Not Found: Check that image paths exist and are accessible
Large File Sizes: Generated images may be large; ensure sufficient disk space
Rate Limits: Gemini API has usage limits; implement appropriate delays between requests

Error Messages

GEMINI_API_KEY environment variable not set: Set your API key in the environment
Image file does not exist: Verify the provided image path is correct
Invalid model: Use supported model names like "gemini-2.5-flash-image-preview"

Limitations

Input Image Limit: Best results with up to 3 input images
Supported Languages: Optimized for EN, es-MX, ja-JP, zh-CN, hi-IN
File Formats: Supports PNG, JPEG, WebP for both input and output
Content Policy: Subject to Google's AI content policies

Contributing

This MCP server is based on the Model Context Protocol. To contribute:

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

This project follows the same license terms as the Model Context Protocol.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured