Gemini Image Generation MCP Server

Gemini Image Generation MCP Server

Enables image generation, editing, and refinement using Google's Gemini 2.5 Flash Image model with support for multi-image composition and style transfer.

Category
Visit Server

README

Gemini Image Generation MCP Server

A Model Context Protocol (MCP) Server that interfaces with Google's Gemini API for image generation using the Gemini 2.5 Flash Image model.

Features

  • Generate images using Gemini's 2.5 Flash Image model with various customization options
  • Edit and refine images using multimodal capabilities
  • Multi-image composition and style transfer
  • Conversational image refinement
  • Save generated images to local storage in specified folders
  • Support for multiple image formats (PNG, JPEG, WebP)
  • Include SynthID watermarking for AI-generated images

Installation

Install uv

On Mac you can install it using homebrew

brew install uv

Getting a Gemini API Key

  1. Navigate to Google AI Studio
  2. Sign in with your Google Account
  3. Look for "Get API key" or navigate to the API key management section
  4. Follow the prompts to create a new key
  5. Google AI Studio will generate a unique string of characters – this is your API key

Usage with Claude Code

Configure Claude Code to use this MCP server by updating your .cursor/mcp.json or other agent configuration:

{
  "mcpServers": {
    "gemini-image": {
      "command": "uv",
      "args": [
        "--directory",
        "/Users/marabian/mcp-servers/gemini-imagegen-mcp",
        "run",
        "mcp",
        "run",
        "main.py"
      ],
      "env": {
        "GEMINI_API_KEY": "your_api_key_here",
        "DEFAULT_SAVE_DIR": "/path/to/default/save/directory"
      }
    }
  }
}

Important Configuration Notes:

  1. Replace your_api_key_here with your actual Gemini API key.

  2. For the DEFAULT_SAVE_DIR:

    • Set this to a directory where you want to save all generated images
    • You can use a relative path like ./images within your project
    • For projects, consider using a path like ${PROJECT_ROOT}/generated-images
    • Defaults to ./generated-images in the MCP server directory if not specified
  3. When working in different projects:

    • The agent will save images to DEFAULT_SAVE_DIR by default
    • You can override this within each tool call using the save_dir parameter
    • Images will be organized with unique timestamps to prevent conflicts

Available Tools

The Gemini Image Generation MCP Server provides the following tools:

Image Generation

  • generate_image - Generate an image based on a text prompt
  • edit_image - Edit existing images using multimodal prompts
  • refine_image - Refine an existing image with conversational instructions
  • get_available_models - List available Gemini image generation models

Image Management

  • list_saved_images - List images saved in the specified directory
  • set_save_directory - Set the directory where generated images will be saved

Tool Parameters

generate_image

  • prompt (required): The text prompt describing the image to generate
  • model: Model to use (default: "gemini-2.5-flash-image-preview")
  • temperature: Controls randomness (0.0-1.0, default: 0.7)
  • top_p: Controls nucleus sampling (0.0-1.0, default: 0.95)
  • top_k: Controls top-k sampling (default: 40)
  • max_output_tokens: Maximum tokens to generate (optional)
  • save_dir: Directory to save images (optional, uses default if not specified)
  • filename: Custom filename (optional, timestamp-based if not specified)
  • include_text_response: Include text alongside images (default: true)

edit_image

  • prompt (required): Instructions for editing the image
  • image_paths (required): List of image file paths to edit (up to 3 recommended)
  • model: Model to use (default: "gemini-2.5-flash-image-preview")
  • temperature: Controls randomness (0.0-1.0, default: 0.7)
  • top_p: Controls nucleus sampling (0.0-1.0, default: 0.95)
  • top_k: Controls top-k sampling (default: 40)
  • max_output_tokens: Maximum tokens to generate (optional)
  • save_dir: Directory to save images (optional, uses default if not specified)
  • filename: Custom filename (optional, timestamp-based if not specified)
  • include_text_response: Include text alongside images (default: true)

refine_image

  • prompt (required): Original prompt used to generate the image
  • previous_image_path (required): Path to the image to refine
  • refinement_instruction (required): Instructions for refinement
  • model: Model to use (default: "gemini-2.5-flash-image-preview")
  • temperature: Controls randomness (0.0-1.0, default: 0.7)
  • save_dir: Directory to save images (optional, uses default if not specified)
  • filename: Custom filename (optional, timestamp-based if not specified)

Examples

Generating a Basic Image

# Generate a landscape image
generate_image(
    prompt="A serene mountain landscape at sunset with a lake reflection",
    temperature=0.8,
    save_dir="/path/to/project/images",
    filename="mountain_sunset"
)

Editing an Existing Image

# Edit an image to add elements
edit_image(
    prompt="Add a small wooden boat on the lake",
    image_paths=["/path/to/mountain_sunset.png"],
    temperature=0.7,
    save_dir="/path/to/project/edits",
    filename="mountain_with_boat"
)

Multi-Image Composition

# Combine multiple images into one scene
edit_image(
    prompt="Create a cohesive fantasy scene combining these elements",
    image_paths=[
        "/path/to/dragon.png", 
        "/path/to/castle.png", 
        "/path/to/forest.png"
    ],
    temperature=0.6,
    filename="fantasy_scene"
)

Conversational Refinement

# Refine an existing image with specific instructions
refine_image(
    prompt="A modern office workspace",
    previous_image_path="/path/to/office.png",
    refinement_instruction="Make the lighting warmer and add some plants",
    filename="office_warmer"
)

Advanced Generation with Style Control

# Generate with specific artistic style
generate_image(
    prompt="A portrait of a wise old wizard in the style of Renaissance paintings, oil on canvas, dramatic lighting, detailed brushwork",
    temperature=0.5,
    top_p=0.9,
    max_output_tokens=2000,
    filename="renaissance_wizard"
)

Model Features

Gemini 2.5 Flash Image

  • Text-to-Image Generation: Create images from detailed text descriptions
  • Image Editing: Modify existing images with natural language instructions
  • Multi-Image Composition: Combine multiple input images into new scenes
  • Style Transfer: Apply artistic styles from reference images
  • Conversational Refinement: Iteratively improve images through dialogue
  • World Knowledge: Leverages Gemini's understanding for contextually accurate images
  • Character Consistency: Maintain consistent appearance across multiple generations
  • SynthID Watermarking: All generated images include invisible AI identification

Pricing

  • Image Generation: $30 per 1 million output tokens
  • Tokens per Image: 1,290 tokens
  • Cost per Image: ~$0.039

Best Practices

Prompt Writing

  1. Be Descriptive: Use detailed, narrative descriptions rather than keyword lists
  2. Include Photography Terms: For realistic images, mention camera angles, lens types, lighting
  3. Specify Style: Clearly indicate artistic style, medium, or technique desired
  4. Provide Context: Include setting, mood, and atmospheric details

Image Editing

  1. Start with Quality Input: Use high-resolution, clear input images
  2. Limit Input Images: Use up to 3 input images for optimal results
  3. Be Specific: Provide clear instructions about what to change or add
  4. Iterate Gradually: Make incremental changes for better control

Performance Optimization

  1. Use Appropriate Temperature: Lower values (0.3-0.5) for precise results, higher (0.7-1.0) for creativity
  2. Control Token Usage: Set max_output_tokens when appropriate
  3. Batch Related Tasks: Generate multiple variations in sequence for consistency

Troubleshooting

Common Issues

  1. API Key Errors: Ensure your Gemini API key is valid and has image generation permissions
  2. File Not Found: Check that image paths exist and are accessible
  3. Large File Sizes: Generated images may be large; ensure sufficient disk space
  4. Rate Limits: Gemini API has usage limits; implement appropriate delays between requests

Error Messages

  • GEMINI_API_KEY environment variable not set: Set your API key in the environment
  • Image file does not exist: Verify the provided image path is correct
  • Invalid model: Use supported model names like "gemini-2.5-flash-image-preview"

Limitations

  1. Input Image Limit: Best results with up to 3 input images
  2. Supported Languages: Optimized for EN, es-MX, ja-JP, zh-CN, hi-IN
  3. File Formats: Supports PNG, JPEG, WebP for both input and output
  4. Content Policy: Subject to Google's AI content policies

Contributing

This MCP server is based on the Model Context Protocol. To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

This project follows the same license terms as the Model Context Protocol.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured