gemini-mcp-server

gemini-mcp-server

Enables AI assistants to query Google AI (Gemini models) via Vertex AI or Google AI Studio with automatic tool selection, multi-turn reasoning, and multimodal input support.

Category
Visit Server

README

gemini-mcp-server

An intelligent MCP (Model Context Protocol) server that enables AI assistants to query Google AI (Gemini models) via Vertex AI or Google AI Studio with agentic capabilities - automatic tool selection, multi-turn reasoning, MCP-to-MCP delegation, and multimodal input support.

Purpose

This server provides:

  • Agentic Loop: Turn-based execution with automatic tool selection and reasoning
  • Query Gemini: Access Gemini models via Vertex AI or Google AI Studio
  • Multimodal Support: Send images, audio, video, and code files alongside text prompts
  • Image Generation: Generate images using Gemini image models (gemini-3-pro-image, gemini-3.1-flash-image, gemini-2.5-flash-image)
  • Speech & Music Generation: Generate TTS audio with Gemini TTS and music with Lyria
  • Tool Execution: Built-in WebFetch + integration with external MCP servers
  • Multi-turn Conversations: Maintain context across queries with session management
  • Reasoning Traces: File-based logging of AI thinking processes
  • Gemini 3 Support: Full support for Gemini 3 models including thinkingLevel parameter

Key Features

šŸŽ­ System Prompt Customization

Customize the AI assistant's behavior and persona:

  • Domain-Specific Roles: Configure as financial analyst, code reviewer, research assistant, etc.
  • Environment-Based: Set via GEMINI_SYSTEM_PROMPT environment variable
  • Multi-Persona Support: Run multiple servers with different personas
  • 100% Backward Compatible: Optional feature - works normally without customization
  • See PROMPT_CUSTOMIZATION.md for detailed guide and examples/custom-prompts.md for templates

šŸŽØ Multimodal Input Support

Send images, audio, video, and code files to Gemini:

  • Images: JPEG, PNG, WebP, HEIC, HEIF
  • Videos: MP4, MOV, AVI, WebM, and more
  • Audio: MP3, WAV, AAC, FLAC, and more
  • Documents/Code: PDF, text files, code files (Python, JavaScript, etc.)
  • Support for both base64-encoded inline data and Cloud Storage URIs
  • See MULTIMODAL.md for detailed documentation

šŸ¤– Intelligent Agentic Loop

Inspired by OpenAI Agents SDK, the server operates as an autonomous agent:

  • Turn-based execution (up to 10 turns per query)
  • Automatic tool selection based on LLM decisions
  • Parallel tool execution with retry logic
  • Smart fallback to Gemini knowledge when tools fail

šŸ”® Gemini 3 Model Support

Full support for Gemini 3 generation models:

  • gemini-3.5-flash: Default model — fast and capable
  • gemini-3.1-pro-preview: High-capability reasoning model
  • gemini-3.1-flash-lite: Cost-efficient multimodal model for high-volume workloads
  • gemini-3.1-pro-preview-customtools: Agentic endpoint optimized for custom tools
  • thinkingLevel: Per-query thinking budget control for Gemini 3 models
  • GEMINI_MEDIA_RESOLUTION: Control media quality for multimodal inputs

šŸ› ļø Built-in Tools

  • WebFetch: Secure HTTPS-only web content fetching with private IP blocking
  • MCP Integration: Dynamic discovery and execution of external MCP server tools

šŸ–¼ļø Image Generation

Generate images directly from text prompts using Gemini image models:

  • gemini-3-pro-image: Professional asset production with 4K resolution support (default)
  • gemini-3.1-flash-image: High-efficiency generation with 0.5K-4K resolution and reference images
  • gemini-2.5-flash-image: Fast 1K image generation and editing (retiring 2026-10-02; prefer gemini-3.1-flash-image)
  • Configurable aspect ratios: 1:1, 16:9, 9:16, 4:3, and more
  • Images automatically saved to configurable output directory

šŸŽ§ Audio Generation

Generate file-based audio outputs:

  • generate_speech: Gemini TTS single-speaker or two-speaker speech, saved as WAV
  • generate_music: Lyria 3 music generation, saved as MP3; Gemini API/AI Studio mode can request WAV for lyria-3-pro-preview
  • Speech defaults to ~/Music/gemini-generated/speech; music defaults to ~/Music/gemini-generated/music
  • Generation failures return structured MCP error content with status, tool, errorType, message, and validation issues when available
  • See GENERATION.md, AUDIO_GENERATION.md, examples/audio-generation.md, and examples/video-generation.md

šŸ” Security First

Multi-Layer Defense:

  • SSRF Protection: HTTPS-only URL fetching, private IP blocking (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x), cloud metadata endpoint blocking (AWS, GCP, Azure)
  • Prompt Injection Guardrails: External content tagging, trust boundaries, system prompt hardening
  • File Security: MIME type validation, executable file rejection, path traversal prevention, directory whitelist
  • Redirect Validation: Manual redirect handling with security checks, maximum 5 redirects, cross-domain blocking
  • Content Boundaries: 50KB size limits, external content wrapping with security tags

Comprehensive Testing: 69 security-focused tests covering SSRF, path traversal, MIME validation, and prompt injection.

See SECURITY.md for detailed security documentation and best practices.

šŸ“ Observability

  • File-based logging (logs/general.log, logs/reasoning.log)
  • Configurable log directory or disable logging for npx/containerized environments
  • Detailed execution traces for debugging
  • Turn and tool usage statistics

Prerequisites

  • Node.js 18 or higher
  • Google Cloud Platform account with Vertex AI enabled, or a Google AI Studio API key
  • Google Cloud credentials configured for Vertex AI mode

Quick Start

Installation

Option 1: npx (Recommended)

npx -y github:mnthe/gemini-mcp-server

Option 2: From Source

git clone https://github.com/mnthe/gemini-mcp-server.git
cd gemini-mcp-server
npm install
npm run build

Authentication

The server supports both Vertex AI and Google AI Studio / Gemini Developer API mode.

Vertex AI mode:

Application Default Credentials (Recommended):

gcloud auth application-default login

Or use Service Account:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

Google AI Studio mode:

export GEMINI_API_KEY="your-ai-studio-api-key"
export GOOGLE_GENAI_USE_VERTEXAI="false"

Configuration

Required Environment Variables:

# Vertex AI mode
export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"

# Or Google AI Studio mode
export GEMINI_API_KEY="your-ai-studio-api-key"
export GOOGLE_GENAI_USE_VERTEXAI="false"

Optional Model Settings:

export GEMINI_MODEL="gemini-3.5-flash"  # Default model
export GEMINI_TEMPERATURE="1.0"
export GEMINI_MAX_TOKENS="8192"
export GEMINI_TOP_P="0.95"
export GEMINI_TOP_K="40"

Optional Agentic Features:

# System prompt customization
export GEMINI_SYSTEM_PROMPT="You are a specialized financial analyst AI assistant. You have access to the following tools:"

# Multi-turn conversations
export GEMINI_ENABLE_CONVERSATIONS="true"
export GEMINI_SESSION_TIMEOUT="3600"
export GEMINI_MAX_HISTORY="10"

# Logging configuration
# Default: Console logging to stderr (recommended for npx/MCP usage)
export GEMINI_LOG_TO_STDERR="true"         # Default: true (console logging)

# For file-based logging instead:
export GEMINI_LOG_TO_STDERR="false"        # Disable console, use file logging
export GEMINI_LOG_DIR="./logs"             # Log directory (default: ./logs)

# To disable logging completely:
export GEMINI_DISABLE_LOGGING="true"

# File URI support (for CLI environments only)
export GEMINI_ALLOW_FILE_URIS="true"       # Set to 'true' to allow file:// URIs (CLI tools only, NOT for desktop apps)

# Media resolution for Gemini 3 models (videoMetadata and image quality)
export GEMINI_MEDIA_RESOLUTION="medium"    # Options: low, medium, high (default: not set)

# Image generation output directory
export GEMINI_IMAGE_OUTPUT_DIR="/path/to/images"  # Default: ~/Pictures/gemini-generated
export GEMINI_VIDEO_OUTPUT_DIR="/path/to/videos"  # Default: ~/Movies/gemini-generated on macOS, ~/Videos/gemini-generated on Windows/Linux
export GEMINI_SPEECH_OUTPUT_DIR="/path/to/speech" # Default: ~/Music/gemini-generated/speech
export GEMINI_MUSIC_OUTPUT_DIR="/path/to/music"   # Default: ~/Music/gemini-generated/music

# External MCP servers (for tool delegation)
export GEMINI_MCP_SERVERS='[
  {
    "name": "filesystem",
    "transport": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "./data"]
  },
  {
    "name": "web-search",
    "transport": "http",
    "url": "http://localhost:3000/mcp"
  }
]'

MCP Client Integration

Add to your MCP client configuration:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_MODEL": "gemini-3.5-flash",
        "GEMINI_ENABLE_CONVERSATIONS": "true"
      }
    }
  }
}

Claude Code (.claude.json in project root):

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_MODEL": "gemini-3.5-flash"
      }
    }
  }
}

Other MCP Clients (Generic stdio):

# Command to run
npx -y github:mnthe/gemini-mcp-server

# Or direct execution
node /path/to/gemini-mcp-server/build/index.js

Multi-Persona Setup

You can run multiple Gemini servers with different personas for specialized tasks:

{
  "mcpServers": {
    "gemini-code": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_SYSTEM_PROMPT": "You are a code review specialist. Focus on code quality, security, and best practices. You have access to the following tools:"
      }
    },
    "gemini-research": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_SYSTEM_PROMPT": "You are an academic research assistant. Cite sources and provide comprehensive analysis. You have access to the following tools:"
      }
    }
  }
}

See PROMPT_CUSTOMIZATION.md for comprehensive guide and examples/custom-prompts.md for ready-to-use templates.

Available Tools

The server exposes eight MCP tools: query, search, fetch, generate_image, generate_speech, generate_music, generate_video, and check_video.

query

Main agentic entrypoint that handles multi-turn execution with automatic tool selection and multimodal input support.

Parameters:

  • prompt (string, required): The text prompt to send
  • sessionId (string, optional): Conversation session ID
  • model (string, optional): Model override (e.g., gemini-3.5-flash, gemini-3.1-pro-preview, gemini-3.1-flash-lite, gemini-3.1-pro-preview-customtools)
  • thinkingLevel (string, optional): Gemini 3 thinking level. Options: minimal, low, medium, high
  • mediaResolution (string, optional): Global media resolution for multimodal inputs. Options: low, medium, high
  • parts (array, optional): Multimodal content parts (images, audio, video, documents)

How It Works:

  1. Analyzes the prompt and conversation history (including multimodal content)
  2. Decides whether to use tools or respond directly
  3. Executes tools in parallel if needed (WebFetch, MCP tools)
  4. Retries failed tools with exponential backoff
  5. Falls back to Gemini knowledge if tools fail
  6. Continues for up to 10 turns until final answer

Examples:

# Simple text query
query: "What is the capital of France?"

# Complex query with tool usage
query: "Fetch the latest news from https://example.com/news and summarize"
→ Automatically uses WebFetch tool
→ Synthesizes content into answer

# Image analysis (multimodal)
query: "What's in this image?"
parts: [{ inlineData: { mimeType: "image/jpeg", data: "<base64>" } }]

# Multi-turn conversation
query: "What is machine learning?" (sessionId auto-created)
query: "Give me an example" (uses sessionId from previous response)

Multimodal Support: See MULTIMODAL.md for detailed documentation on:

  • Parts array structure and field requirements (for agent developers)
  • Supported file types (images, audio, video, documents)
  • Base64 inline data vs Cloud Storage URIs
  • Complete schema and validation rules
  • Usage examples and code samples
  • Best practices and limitations
  • Common mistakes to avoid

Response Includes:

  • Final answer
  • Session ID (if conversations enabled)
  • Statistics: turns used, tool calls, reasoning steps

search

Search for information using Gemini (OpenAI MCP spec).

Parameters:

  • query (string, required): Search query

Returns:

  • results: Array of {id, title, url}

fetch

Fetch full content of a search result (OpenAI MCP spec).

Parameters:

  • id (string, required): Document ID from search results

Returns:

  • id, title, text, url, metadata

generate_image

Generate images from text prompts using Gemini image models.

Parameters:

  • prompt (string, required): Image generation prompt describing what to generate
  • model (string, optional): Image model to use. Options:
    • gemini-3-pro-image (default) — professional quality, supports up to 4K resolution
    • gemini-3.1-flash-image — high-efficiency with 0.5K-4K and reference image support
    • gemini-2.5-flash-image — fast 1K image generation and editing (retiring 2026-10-02; prefer gemini-3.1-flash-image)
  • aspectRatio (string, optional): Image aspect ratio. Default: 1:1. Options: 1:1, 1:4, 1:8, 2:3, 3:2, 3:4, 4:1, 4:3, 4:5, 5:4, 8:1, 9:16, 16:9, 21:9 (1:4, 1:8, 4:1, 8:1 require gemini-3.1-flash-image)
  • imageSize (string, optional): Output resolution. Default: 1K. Options: 0.5K, 1K, 2K, 4K (0.5K requires gemini-3.1-flash-image; omit for gemini-2.5-flash-image)
  • imagePaths (array, optional): Local reference images for editing or style transfer (max 14; gemini-2.5-flash-image supports at most 3). Supported file types: PNG (.png), JPEG (.jpg, .jpeg), WEBP (.webp), HEIC (.heic), HEIF (.heif)
  • systemInstruction (string, optional): System instruction for Gemini 3 image models
  • thinkingLevel (string, optional): Gemini 3.1 Flash Image thinking level: minimal or high
  • mediaResolution (string, optional): Media resolution for reference image inputs: low, medium, high

Behavior:

  • Generated images are saved to GEMINI_IMAGE_OUTPUT_DIR (defaults to ~/Pictures/gemini-generated on macOS, Windows, and Linux)
  • Returns image data (base64) along with file paths of saved images

Examples:

# Generate a square image with default model
generate_image: "A serene mountain landscape at sunset"

# Generate a wide-format image with Nano Banana 2 at 4K
generate_image: "Futuristic cityscape at night"
model: "gemini-3.1-flash-image"
aspectRatio: "16:9"
imageSize: "4K"

generate_speech

Generate speech from text using Gemini TTS models.

Parameters:

  • prompt (string, required): Text or transcript to synthesize
  • model (string, optional): Speech model. Options: gemini-3.1-flash-tts-preview (default), gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts
  • voiceName (string, optional): Prebuilt voice for single-speaker TTS. Default: Kore
  • languageCode (string, optional): BCP-47 language code
  • speakers (array, optional): Exactly two { speaker, voiceName } entries for multi-speaker TTS

Behavior:

  • Generated speech is saved to GEMINI_SPEECH_OUTPUT_DIR (defaults to ~/Music/gemini-generated/speech)
  • Returns MCP audio content and saved file paths
  • Gemini TTS is text-only input; audio, image, and video reference files are not supported by generate_speech

generate_music

Generate music using Lyria 3 models.

Parameters:

  • prompt (string, required): Music generation prompt
  • model (string, optional): Music model. Options: lyria-3-clip-preview (default), lyria-3-pro-preview
  • outputMimeType (string, optional): Vertex AI mode supports audio/mp3 only. Gemini API/AI Studio mode supports audio/mp3, or audio/wav with lyria-3-pro-preview
  • imagePaths (array, optional): Local image paths for multimodal music generation inputs (max 10). Supported file types: PNG (.png), JPEG (.jpg, .jpeg), WEBP (.webp), HEIC (.heic), HEIF (.heif)
  • lyrics (string, optional): User-provided lyrics
  • instrumental (boolean, optional): Request instrumental-only output; cannot be combined with lyrics or vocalStyle
  • vocalStyle (string, optional): Vocal generation direction
  • language (string, optional): Output language direction. Options: English, German, Spanish, French, Hindi, Japanese, Korean, Portuguese
  • durationSeconds (number, optional): Target duration in seconds; requires lyria-3-pro-preview; max 184 seconds
  • bpm (number, optional): Tempo direction in beats per minute
  • intensity (string, optional): low, medium, or high

Behavior:

  • Generated music is saved to GEMINI_MUSIC_OUTPUT_DIR (defaults to ~/Music/gemini-generated/music)
  • Returns MCP audio content, saved file paths, and any lyrics/song-structure text returned by Lyria
  • Lyria 3 Clip is fixed at 30 seconds; Lyria 3 Pro supports longer structured songs up to 184 seconds
  • Lyria 3 output is 44.1 kHz, one clip per prompt; Vertex AI mode supports 192 kbps MP3 only, while Gemini API/AI Studio Pro can also request WAV
  • Lyria 3 accepts text prompts and optional image references only; audio and video reference files are not supported by generate_music; negative prompting is not supported

generate_video

Generate videos from text prompts using Veo video generation models.

Parameters:

  • prompt (string, required): Video generation prompt describing what to generate
  • model (string, optional): Video model to use. Default: veo-3.1-fast-generate-001. Options:
    • veo-3.1-fast-generate-001 (default) — fast video generation
    • veo-3.1-generate-001 — standard quality generation
    • veo-3.1-lite-generate-001 — cost-efficient generation
  • aspectRatio (string, optional): Video aspect ratio. Default: 16:9. Options: 16:9, 9:16
  • durationSeconds (string, optional): Video duration. Default: 8. Options: 4, 6, 8 (1080p/4k require 8)
  • resolution (string, optional): Video resolution. Default: 720p. Options: 720p, 1080p, 4k (1080p/4k require 8 second duration)
  • generateAudio (boolean, optional): Generate audio for the video. Default: true
  • enhancePrompt (boolean, optional): Use Veo prompt rewriting/enhancement
  • personGeneration (string, optional): Person generation control: allow_adult, dont_allow
  • negativePrompt (string, optional): Description of what to exclude from the video
  • seed (number, optional): Random seed for reproducibility
  • numberOfVideos (number, optional): Number of videos to generate. Default: 1
  • imagePath (string, optional): Local file path of input image for image-to-video generation. Supported file types: PNG (.png), JPEG (.jpg, .jpeg), WEBP (.webp)
  • lastFramePath (string, optional): Local file path of last frame for interpolation (requires imagePath). Same supported image file types as imagePath
  • referenceImagePaths (array, optional): Local file paths of reference images for style guidance (max 3, Veo 3.1 only). Same supported image file types as imagePath
  • videoPath (string, optional): Local file path of a Veo-generated 720p MP4 (.mp4) video to extend

Behavior:

  • Generated videos are saved to GEMINI_VIDEO_OUTPUT_DIR (defaults to ~/Movies/gemini-generated on macOS, ~/Videos/gemini-generated on Windows/Linux)
  • generate_video returns an operation ID; check_video polls the operation and returns saved file paths when complete
  • Supports text-to-video, image-to-video, interpolation, reference image, and Veo video extension modes
  • Veo 3.1 Lite does not support 4k or reference asset images; model availability can differ between Vertex AI and Google AI Studio
  • Audio file references are not supported by generate_video; describe dialogue, sound effects, and ambience in prompt

Examples:

# Simple text-to-video
generate_video: "A dancing robot in a cyberpunk city"

# Text-to-video with custom settings
generate_video: "Ocean waves crashing on a beach"
model: "veo-3.1-generate-001"
aspectRatio: "16:9"
durationSeconds: "8"
resolution: "1080p"

# Image-to-video (animation)
generate_video: "Animate this image"
imagePath: "/path/to/image.jpg"

# Interpolation (morph between two frames)
generate_video: "Smooth transition"
imagePath: "/path/to/start_frame.jpg"
lastFramePath: "/path/to/end_frame.jpg"

# Video with reference images for style
generate_video: "Generate a video with cyberpunk aesthetic"
referenceImagePaths: ["/path/to/style1.jpg", "/path/to/style2.jpg"]

# Extend a previous Veo-generated video
generate_video: "Follow the subject as the scene continues into the hallway"
videoPath: "/path/to/previous-veo-output.mp4"
resolution: "720p"

Security

The gemini-mcp-server implements comprehensive security measures to protect against common vulnerabilities. See SECURITY.md for complete documentation.

Defense Layers

1. SSRF (Server-Side Request Forgery) Protection

  • HTTPS-only: HTTP requests are blocked; only HTTPS is allowed for web resources
  • Private IP blocking: Blocks access to internal networks (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x)
  • Cloud metadata blocking: Prevents access to AWS, GCP, Azure, and Alibaba Cloud metadata endpoints
  • Redirect validation: All redirects are manually validated; cross-domain redirects are blocked

2. Prompt Injection Guardrails

  • Trust boundaries: Clear separation between user input (trusted) and external content (untrusted)
  • Content tagging: All fetched web content is wrapped in <external_content> tags with security warnings
  • System prompt hardening: Built-in instructions to ignore malicious commands in external content
  • Information disclosure protection: Guidelines prevent revealing system prompts or internal details

3. File Security (Multimodal Content)

  • MIME type validation: Only known safe types (images, video, audio, PDF, code) are allowed
  • Executable rejection: Blocks .exe, .sh, .dll, and other executable file types
  • Path traversal prevention: All paths are normalized and validated against a whitelist
  • Directory whitelist: Local files only allowed in safe directories (cwd, Documents, Downloads, Desktop)
  • URI scheme validation: Only gs://, https://, and conditionally file:// URIs are allowed

4. Content Boundaries

  • Size limits: Web content limited to 50KB to prevent resource exhaustion
  • Content type validation: Basic validation of response content types
  • Encoding validation: Proper handling of character encodings

Configuration

File Security (Multimodal)

# Default: false (secure) - file:// URIs are disabled
export GEMINI_ALLOW_FILE_URIS="false"

# For CLI environments only - enables local file:// URIs with whitelist validation
export GEMINI_ALLOW_FILE_URIS="true"

Security Note: Never enable GEMINI_ALLOW_FILE_URIS in production or web-facing applications. It's designed for trusted CLI environments only.

Security Monitoring

# Enable logging to monitor security events
export GEMINI_DISABLE_LOGGING="false"
export GEMINI_LOG_DIR="/var/log/gemini-mcp"

# Log to stderr for real-time monitoring
export GEMINI_LOG_TO_STDERR="true"

Best Practices

For Desktop Applications (Recommended)

{
  "mcpServers": {
    "gemini": {
      "env": {
        "GEMINI_ALLOW_FILE_URIS": "false"
      }
    }
  }
}

For CLI Tools (Use with Caution)

export GEMINI_ALLOW_FILE_URIS="true"
export GEMINI_LOG_TO_STDERR="true"

Security Testing

Run comprehensive security test suite:

# All security tests
npx tsx test/url-security-test.ts        # 21 tests - SSRF protection
npx tsx test/file-security-test.ts       # 34 tests - File validation
npx tsx test/webfetch-security-test.ts   # 5 tests - Content tagging
npx tsx test/security-guidelines-test.ts # 3 tests - Prompt injection
npx tsx test/multimodal-security-test.ts # 6 tests - Multimodal files

Total: 69 security-focused tests covering SSRF, path traversal, MIME validation, and prompt injection.

For detailed security information, threat models, and vulnerability reporting, see SECURITY.md.

Architecture

Agentic Loop

User Query
  ↓
ā”Œā”€ā”€ā”€ Turn 1..10 Loop ───┐
│                        │
│  1. Build Prompt       │
│     + Tool Definitions │
│     + History          │
│                        │
│  2. Gemini Generation  │
│     (with thinking)    │
│                        │
│  3. Parse Response     │
│     - Reasoning?       │
│     - Tool Calls?      │
│     - Final Output?    │
│                        │
│  4. Execute Tools      │
│     (parallel + retry) │
│                        │
│  5. Check MaxTurns     │
│     Continue or Exit?  │
│                        │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
  ↓
Final Result + Stats

Project Structure

src/
ā”œā”€ā”€ agentic/           # Core agentic loop
│   ā”œā”€ā”€ AgenticLoop.ts       # Main orchestrator
│   ā”œā”€ā”€ RunState.ts          # Turn-based state management
│   ā”œā”€ā”€ ResponseProcessor.ts # Parse Gemini responses
│   └── Tool.ts              # Tool interface (MCP standard)
│
ā”œā”€ā”€ mcp/               # MCP client implementation
│   ā”œā”€ā”€ EnhancedMCPClient.ts # Unified stdio + HTTP client
│   ā”œā”€ā”€ StdioMCPConnection.ts
│   └── HttpMCPConnection.ts
│
ā”œā”€ā”€ tools/             # Tool implementations
│   ā”œā”€ā”€ WebFetchTool.ts      # Secure web fetching
│   └── ToolRegistry.ts      # Tool management + parallel execution
│
ā”œā”€ā”€ services/          # External services
│   └── GeminiAIService.ts   # Gemini API (with thinkingConfig, image generation)
│
ā”œā”€ā”€ handlers/          # MCP tool handlers
│   ā”œā”€ā”€ QueryHandler.ts
│   ā”œā”€ā”€ SearchHandler.ts
│   ā”œā”€ā”€ FetchHandler.ts
│   └── ImageGenerationHandler.ts  # Image generation via Gemini image models
│
ā”œā”€ā”€ managers/          # Business logic
│   └── ConversationManager.ts
│
ā”œā”€ā”€ errors/            # Custom error types
ā”œā”€ā”€ types/             # TypeScript type definitions
ā”œā”€ā”€ schemas/           # Zod validation schemas (including ImageGenerationSchema)
ā”œā”€ā”€ config/            # Configuration loading
ā”œā”€ā”€ utils/             # Shared utilities (Logger, security, imageSaver)
│
└── server/            # MCP server bootstrap
    └── GeminiAIMCPServer.ts

See DIRECTORY_STRUCTURE.md and ARCHITECTURE.md for details.

Advanced Usage

External MCP Servers

Connect to external MCP servers for extended capabilities:

Stdio (subprocess):

export GEMINI_MCP_SERVERS='[
  {
    "name": "filesystem",
    "transport": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "./workspace"]
  }
]'

HTTP:

export GEMINI_MCP_SERVERS='[
  {
    "name": "api-server",
    "transport": "http",
    "url": "https://api.example.com/mcp",
    "headers": {"Authorization": "Bearer token"}
  }
]'

Tools from external servers are automatically discovered and made available to the agent.

Reasoning Traces

Default: Console Logging

Logs are sent to stderr by default, making them visible in MCP client logs.

For File-Based Logging:

export GEMINI_LOG_TO_STDERR="false"        # Disable console, use files
export GEMINI_LOG_DIR="./logs"             # Log directory (default: ./logs)

Then check logs:

tail -f logs/general.log     # All logs
tail -f logs/reasoning.log   # Gemini thinking process only

To Disable All Logging:

export GEMINI_DISABLE_LOGGING="true"

Custom Tool Development

Tools follow MCP standard:

import { BaseTool, ToolResult, RunContext } from './agentic/Tool.js';

export class MyTool extends BaseTool {
  name = 'my_tool';
  description = 'Description for LLM';
  parameters = {
    type: 'object',
    properties: {
      arg: { type: 'string', description: 'Argument' }
    },
    required: ['arg']
  };

  async execute(args: any, context: RunContext): Promise<ToolResult> {
    // Your implementation
    return {
      status: 'success',
      content: 'Result'
    };
  }
}

Development

Build

npm run build

Watch Mode

npm run watch

Development Mode

npm run dev

Troubleshooting

MCP Server Connection Issues

If the MCP server appears to be "dead" or disconnects unexpectedly:

Check MCP client logs (logs are sent to stderr by default):

  • macOS: ~/Library/Logs/Claude/mcp*.log
  • Windows: %APPDATA%\Claude\Logs\mcp*.log

Server logs will appear in these files automatically.

Log Directory Errors

If you encounter errors like ENOENT: no such file or directory, mkdir './logs':

This should not happen with default settings (console logging is default).

If you enabled file logging (GEMINI_LOG_TO_STDERR="false"):

Solution: Use a writable log directory:

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-project-id",
        "GEMINI_LOG_TO_STDERR": "false",
        "GEMINI_LOG_DIR": "/tmp/gemini-logs"
      }
    }
  }
}

Authentication Errors

  1. Verify credentials: gcloud auth application-default login
  2. Check project ID: echo $GOOGLE_CLOUD_PROJECT
  3. Enable Vertex AI API: gcloud services enable aiplatform.googleapis.com

Tool Execution Failures

  • Check logs in logs/general.log (if logging is enabled)
  • Verify MCP server configurations in GEMINI_MCP_SERVERS
  • Ensure external servers are running (for HTTP transport)

MaxTurns Exceeded

  • Agent returns best-effort response after 10 turns
  • Check if tools are repeatedly failing
  • Review reasoning logs to understand loop behavior (if logging is enabled)

Documentation

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured