Gemini Image MCP Server

Gemini Image MCP Server

Provides image generation, modification, and analysis capabilities using Google's Gemini API, enabling AI-powered image operations through natural language.

Category
Visit Server

README

Gemini Image MCP Server

A Model Context Protocol (MCP) server that provides image generation and manipulation capabilities using Google's Gemini API. This server integrates with Claude Desktop and other MCP-compatible clients to enable AI-powered image operations.

Features

  • Image Generation: Create images from text prompts using Gemini 2.5 Flash Image Preview
  • Image Modification: Modify existing images with natural language instructions
  • Image Analysis: Analyze images for objects, text, colors, emotions, and comprehensive insights
  • Batch Generation: Generate multiple images from different prompts in one operation
  • Style Transfer: Apply artistic styles to existing images
  • Rate Limiting: Built-in rate limiting to respect API quotas
  • Safety Settings: Configurable content safety levels

Available Tools

1. generateImage

Generate images from text prompts with customizable options.

Parameters:

  • prompt (required): Text description of the image to generate
  • width (optional): Image width in pixels
  • height (optional): Image height in pixels
  • aspectRatio (optional): One of 1:1, 16:9, 9:16, 4:3, 3:4
  • style (optional): One of realistic, artistic, cartoon, sketch, watercolor, oil-painting
  • quality (optional): One of standard, high, ultra
  • numberOfImages (optional): Number of images to generate (default: 1)

2. modifyImage

Modify existing images using natural language instructions.

Parameters:

  • imageBase64 (required): Base64 encoded image data
  • instructions (required): Text instructions for modification
  • preserveStyle (optional): Whether to preserve original artistic style
  • strength (optional): Modification strength from 0 to 1

3. analyzeImage

Analyze images and extract various types of information.

Parameters:

  • imageBase64 (required): Base64 encoded image data
  • analysisType (optional): One of description, objects, text, colors, emotions, comprehensive
  • detail (optional): Analysis detail level - low, medium, high

4. batchGenerate

Generate multiple images from different prompts efficiently.

Parameters:

  • prompts (required): Array of text prompts
  • baseOptions (optional): Shared options to apply to all generations

5. applyStyleTransfer

Apply artistic styles to existing images.

Parameters:

  • imageBase64 (required): Base64 encoded image data
  • style (required): One of anime, renaissance, impressionist, cyberpunk, minimalist, vintage, futuristic
  • intensity (optional): Style intensity from 0 to 100

Installation

Prerequisites

Setup

  1. Clone or download the project:
git clone <repository-url>
cd gemini-image-mcp
  1. Install dependencies:
npm install
  1. Create environment configuration:
cp .env.example .env
  1. Edit .env and add your Gemini API key:
GEMINI_API_KEY=your-gemini-api-key-here
  1. Build the project:
npm run build

Usage

With Claude Desktop

Add the server to your Claude Desktop configuration file:

On macOS: ~/Library/Application Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "gemini-image": {
      "command": "node",
      "args": ["/path/to/gemini-image-mcp/dist/index.js"],
      "env": {
        "GEMINI_API_KEY": "your-gemini-api-key-here"
      }
    }
  }
}

Standalone Usage

You can also run the server directly for testing:

npm start

Configuration Options

Environment variables you can set:

  • GEMINI_API_KEY (required): Your Google Gemini API key
  • GEMINI_MODEL (optional): Model to use (default: gemini-2.5-flash-image-preview)
  • SAFETY_LEVEL (optional): Content safety level - LOW, MEDIUM, HIGH, BLOCK_NONE (default: MEDIUM)
  • MAX_REQUESTS_PER_MINUTE (optional): Rate limit (default: 10)

Examples

Once integrated with Claude Desktop, you can use natural language to interact with the tools:

Image Generation

"Generate an image of a sunset over mountains in watercolor style"

Image Modification

"Take this image and add a rainbow in the sky while preserving the original style"

Image Analysis

"Analyze this image and tell me what objects you can detect with confidence scores"

Batch Generation

"Generate 3 different versions of a futuristic cityscape: one cyberpunk style, one minimalist, and one realistic"

Style Transfer

"Apply an impressionist style to this photograph with high intensity"

Development

Running in Development Mode

npm run dev

Building

npm run build

Type Checking

npm run typecheck

Linting

npm run lint

API Limitations

  • Rate limiting is enforced based on your configuration
  • Image generation may take 10-30 seconds depending on complexity
  • Maximum image size depends on Gemini API limits
  • Content safety filters are applied based on your safety level setting

Troubleshooting

Common Issues

  1. "GEMINI_API_KEY environment variable is required"

    • Ensure you've set the API key in your environment or Claude Desktop config
  2. "Rate limit exceeded"

    • Wait for the rate limit window to reset or adjust MAX_REQUESTS_PER_MINUTE
  3. "Image generation failed"

    • Check your prompt for potentially unsafe content
    • Verify your API key has proper permissions
    • Try adjusting the safety level settings

Debug Logging

The server logs errors to stderr. Check the Claude Desktop console or your terminal for detailed error messages.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linting
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues and questions:

  • Check the troubleshooting section above
  • Review Claude Desktop MCP documentation
  • Create an issue in the repository

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured