gemini-mcp-server
Enables AI assistants to query Google AI (Gemini models) via Vertex AI or Google AI Studio with automatic tool selection, multi-turn reasoning, and multimodal input support.
README
gemini-mcp-server
An intelligent MCP (Model Context Protocol) server that enables AI assistants to query Google AI (Gemini models) via Vertex AI or Google AI Studio with agentic capabilities - automatic tool selection, multi-turn reasoning, MCP-to-MCP delegation, and multimodal input support.
Purpose
This server provides:
- Agentic Loop: Turn-based execution with automatic tool selection and reasoning
- Query Gemini: Access Gemini models via Vertex AI or Google AI Studio
- Multimodal Support: Send images, audio, video, and code files alongside text prompts
- Image Generation: Generate images using Gemini image models (gemini-3-pro-image, gemini-3.1-flash-image, gemini-2.5-flash-image)
- Speech & Music Generation: Generate TTS audio with Gemini TTS and music with Lyria
- Tool Execution: Built-in WebFetch + integration with external MCP servers
- Multi-turn Conversations: Maintain context across queries with session management
- Reasoning Traces: File-based logging of AI thinking processes
- Gemini 3 Support: Full support for Gemini 3 models including thinkingLevel parameter
Key Features
š System Prompt Customization
Customize the AI assistant's behavior and persona:
- Domain-Specific Roles: Configure as financial analyst, code reviewer, research assistant, etc.
- Environment-Based: Set via
GEMINI_SYSTEM_PROMPTenvironment variable - Multi-Persona Support: Run multiple servers with different personas
- 100% Backward Compatible: Optional feature - works normally without customization
- See PROMPT_CUSTOMIZATION.md for detailed guide and examples/custom-prompts.md for templates
šØ Multimodal Input Support
Send images, audio, video, and code files to Gemini:
- Images: JPEG, PNG, WebP, HEIC, HEIF
- Videos: MP4, MOV, AVI, WebM, and more
- Audio: MP3, WAV, AAC, FLAC, and more
- Documents/Code: PDF, text files, code files (Python, JavaScript, etc.)
- Support for both base64-encoded inline data and Cloud Storage URIs
- See MULTIMODAL.md for detailed documentation
š¤ Intelligent Agentic Loop
Inspired by OpenAI Agents SDK, the server operates as an autonomous agent:
- Turn-based execution (up to 10 turns per query)
- Automatic tool selection based on LLM decisions
- Parallel tool execution with retry logic
- Smart fallback to Gemini knowledge when tools fail
š® Gemini 3 Model Support
Full support for Gemini 3 generation models:
- gemini-3.5-flash: Default model ā fast and capable
- gemini-3.1-pro-preview: High-capability reasoning model
- gemini-3.1-flash-lite: Cost-efficient multimodal model for high-volume workloads
- gemini-3.1-pro-preview-customtools: Agentic endpoint optimized for custom tools
- thinkingLevel: Per-query thinking budget control for Gemini 3 models
- GEMINI_MEDIA_RESOLUTION: Control media quality for multimodal inputs
š ļø Built-in Tools
- WebFetch: Secure HTTPS-only web content fetching with private IP blocking
- MCP Integration: Dynamic discovery and execution of external MCP server tools
š¼ļø Image Generation
Generate images directly from text prompts using Gemini image models:
- gemini-3-pro-image: Professional asset production with 4K resolution support (default)
- gemini-3.1-flash-image: High-efficiency generation with 0.5K-4K resolution and reference images
- gemini-2.5-flash-image: Fast 1K image generation and editing (retiring 2026-10-02; prefer gemini-3.1-flash-image)
- Configurable aspect ratios: 1:1, 16:9, 9:16, 4:3, and more
- Images automatically saved to configurable output directory
š§ Audio Generation
Generate file-based audio outputs:
- generate_speech: Gemini TTS single-speaker or two-speaker speech, saved as WAV
- generate_music: Lyria 3 music generation, saved as MP3; Gemini API/AI Studio mode can request WAV for
lyria-3-pro-preview - Speech defaults to
~/Music/gemini-generated/speech; music defaults to~/Music/gemini-generated/music - Generation failures return structured MCP error content with
status,tool,errorType,message, and validationissueswhen available - See GENERATION.md, AUDIO_GENERATION.md, examples/audio-generation.md, and examples/video-generation.md
š Security First
Multi-Layer Defense:
- SSRF Protection: HTTPS-only URL fetching, private IP blocking (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x), cloud metadata endpoint blocking (AWS, GCP, Azure)
- Prompt Injection Guardrails: External content tagging, trust boundaries, system prompt hardening
- File Security: MIME type validation, executable file rejection, path traversal prevention, directory whitelist
- Redirect Validation: Manual redirect handling with security checks, maximum 5 redirects, cross-domain blocking
- Content Boundaries: 50KB size limits, external content wrapping with security tags
Comprehensive Testing: 69 security-focused tests covering SSRF, path traversal, MIME validation, and prompt injection.
See SECURITY.md for detailed security documentation and best practices.
š Observability
- File-based logging (
logs/general.log,logs/reasoning.log) - Configurable log directory or disable logging for npx/containerized environments
- Detailed execution traces for debugging
- Turn and tool usage statistics
Prerequisites
- Node.js 18 or higher
- Google Cloud Platform account with Vertex AI enabled, or a Google AI Studio API key
- Google Cloud credentials configured for Vertex AI mode
Quick Start
Installation
Option 1: npx (Recommended)
npx -y github:mnthe/gemini-mcp-server
Option 2: From Source
git clone https://github.com/mnthe/gemini-mcp-server.git
cd gemini-mcp-server
npm install
npm run build
Authentication
The server supports both Vertex AI and Google AI Studio / Gemini Developer API mode.
Vertex AI mode:
Application Default Credentials (Recommended):
gcloud auth application-default login
Or use Service Account:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
Google AI Studio mode:
export GEMINI_API_KEY="your-ai-studio-api-key"
export GOOGLE_GENAI_USE_VERTEXAI="false"
Configuration
Required Environment Variables:
# Vertex AI mode
export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"
# Or Google AI Studio mode
export GEMINI_API_KEY="your-ai-studio-api-key"
export GOOGLE_GENAI_USE_VERTEXAI="false"
Optional Model Settings:
export GEMINI_MODEL="gemini-3.5-flash" # Default model
export GEMINI_TEMPERATURE="1.0"
export GEMINI_MAX_TOKENS="8192"
export GEMINI_TOP_P="0.95"
export GEMINI_TOP_K="40"
Optional Agentic Features:
# System prompt customization
export GEMINI_SYSTEM_PROMPT="You are a specialized financial analyst AI assistant. You have access to the following tools:"
# Multi-turn conversations
export GEMINI_ENABLE_CONVERSATIONS="true"
export GEMINI_SESSION_TIMEOUT="3600"
export GEMINI_MAX_HISTORY="10"
# Logging configuration
# Default: Console logging to stderr (recommended for npx/MCP usage)
export GEMINI_LOG_TO_STDERR="true" # Default: true (console logging)
# For file-based logging instead:
export GEMINI_LOG_TO_STDERR="false" # Disable console, use file logging
export GEMINI_LOG_DIR="./logs" # Log directory (default: ./logs)
# To disable logging completely:
export GEMINI_DISABLE_LOGGING="true"
# File URI support (for CLI environments only)
export GEMINI_ALLOW_FILE_URIS="true" # Set to 'true' to allow file:// URIs (CLI tools only, NOT for desktop apps)
# Media resolution for Gemini 3 models (videoMetadata and image quality)
export GEMINI_MEDIA_RESOLUTION="medium" # Options: low, medium, high (default: not set)
# Image generation output directory
export GEMINI_IMAGE_OUTPUT_DIR="/path/to/images" # Default: ~/Pictures/gemini-generated
export GEMINI_VIDEO_OUTPUT_DIR="/path/to/videos" # Default: ~/Movies/gemini-generated on macOS, ~/Videos/gemini-generated on Windows/Linux
export GEMINI_SPEECH_OUTPUT_DIR="/path/to/speech" # Default: ~/Music/gemini-generated/speech
export GEMINI_MUSIC_OUTPUT_DIR="/path/to/music" # Default: ~/Music/gemini-generated/music
# External MCP servers (for tool delegation)
export GEMINI_MCP_SERVERS='[
{
"name": "filesystem",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./data"]
},
{
"name": "web-search",
"transport": "http",
"url": "http://localhost:3000/mcp"
}
]'
MCP Client Integration
Add to your MCP client configuration:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"gemini": {
"command": "npx",
"args": ["-y", "github:mnthe/gemini-mcp-server"],
"env": {
"GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"GEMINI_MODEL": "gemini-3.5-flash",
"GEMINI_ENABLE_CONVERSATIONS": "true"
}
}
}
}
Claude Code (.claude.json in project root):
{
"mcpServers": {
"gemini": {
"command": "npx",
"args": ["-y", "github:mnthe/gemini-mcp-server"],
"env": {
"GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"GEMINI_MODEL": "gemini-3.5-flash"
}
}
}
}
Other MCP Clients (Generic stdio):
# Command to run
npx -y github:mnthe/gemini-mcp-server
# Or direct execution
node /path/to/gemini-mcp-server/build/index.js
Multi-Persona Setup
You can run multiple Gemini servers with different personas for specialized tasks:
{
"mcpServers": {
"gemini-code": {
"command": "npx",
"args": ["-y", "github:mnthe/gemini-mcp-server"],
"env": {
"GOOGLE_CLOUD_PROJECT": "your-project-id",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"GEMINI_SYSTEM_PROMPT": "You are a code review specialist. Focus on code quality, security, and best practices. You have access to the following tools:"
}
},
"gemini-research": {
"command": "npx",
"args": ["-y", "github:mnthe/gemini-mcp-server"],
"env": {
"GOOGLE_CLOUD_PROJECT": "your-project-id",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"GEMINI_SYSTEM_PROMPT": "You are an academic research assistant. Cite sources and provide comprehensive analysis. You have access to the following tools:"
}
}
}
}
See PROMPT_CUSTOMIZATION.md for comprehensive guide and examples/custom-prompts.md for ready-to-use templates.
Available Tools
The server exposes eight MCP tools: query, search, fetch, generate_image, generate_speech, generate_music, generate_video, and check_video.
query
Main agentic entrypoint that handles multi-turn execution with automatic tool selection and multimodal input support.
Parameters:
prompt(string, required): The text prompt to sendsessionId(string, optional): Conversation session IDmodel(string, optional): Model override (e.g.,gemini-3.5-flash,gemini-3.1-pro-preview,gemini-3.1-flash-lite,gemini-3.1-pro-preview-customtools)thinkingLevel(string, optional): Gemini 3 thinking level. Options:minimal,low,medium,highmediaResolution(string, optional): Global media resolution for multimodal inputs. Options:low,medium,highparts(array, optional): Multimodal content parts (images, audio, video, documents)
How It Works:
- Analyzes the prompt and conversation history (including multimodal content)
- Decides whether to use tools or respond directly
- Executes tools in parallel if needed (WebFetch, MCP tools)
- Retries failed tools with exponential backoff
- Falls back to Gemini knowledge if tools fail
- Continues for up to 10 turns until final answer
Examples:
# Simple text query
query: "What is the capital of France?"
# Complex query with tool usage
query: "Fetch the latest news from https://example.com/news and summarize"
ā Automatically uses WebFetch tool
ā Synthesizes content into answer
# Image analysis (multimodal)
query: "What's in this image?"
parts: [{ inlineData: { mimeType: "image/jpeg", data: "<base64>" } }]
# Multi-turn conversation
query: "What is machine learning?" (sessionId auto-created)
query: "Give me an example" (uses sessionId from previous response)
Multimodal Support: See MULTIMODAL.md for detailed documentation on:
- Parts array structure and field requirements (for agent developers)
- Supported file types (images, audio, video, documents)
- Base64 inline data vs Cloud Storage URIs
- Complete schema and validation rules
- Usage examples and code samples
- Best practices and limitations
- Common mistakes to avoid
Response Includes:
- Final answer
- Session ID (if conversations enabled)
- Statistics: turns used, tool calls, reasoning steps
search
Search for information using Gemini (OpenAI MCP spec).
Parameters:
query(string, required): Search query
Returns:
results: Array of{id, title, url}
fetch
Fetch full content of a search result (OpenAI MCP spec).
Parameters:
id(string, required): Document ID from search results
Returns:
id,title,text,url,metadata
generate_image
Generate images from text prompts using Gemini image models.
Parameters:
prompt(string, required): Image generation prompt describing what to generatemodel(string, optional): Image model to use. Options:gemini-3-pro-image(default) ā professional quality, supports up to 4K resolutiongemini-3.1-flash-imageā high-efficiency with 0.5K-4K and reference image supportgemini-2.5-flash-imageā fast 1K image generation and editing (retiring 2026-10-02; prefer gemini-3.1-flash-image)
aspectRatio(string, optional): Image aspect ratio. Default:1:1. Options:1:1,1:4,1:8,2:3,3:2,3:4,4:1,4:3,4:5,5:4,8:1,9:16,16:9,21:9(1:4,1:8,4:1,8:1requiregemini-3.1-flash-image)imageSize(string, optional): Output resolution. Default:1K. Options:0.5K,1K,2K,4K(0.5Krequiresgemini-3.1-flash-image; omit forgemini-2.5-flash-image)imagePaths(array, optional): Local reference images for editing or style transfer (max 14;gemini-2.5-flash-imagesupports at most 3). Supported file types: PNG (.png), JPEG (.jpg,.jpeg), WEBP (.webp), HEIC (.heic), HEIF (.heif)systemInstruction(string, optional): System instruction for Gemini 3 image modelsthinkingLevel(string, optional): Gemini 3.1 Flash Image thinking level:minimalorhighmediaResolution(string, optional): Media resolution for reference image inputs:low,medium,high
Behavior:
- Generated images are saved to
GEMINI_IMAGE_OUTPUT_DIR(defaults to~/Pictures/gemini-generatedon macOS, Windows, and Linux) - Returns image data (base64) along with file paths of saved images
Examples:
# Generate a square image with default model
generate_image: "A serene mountain landscape at sunset"
# Generate a wide-format image with Nano Banana 2 at 4K
generate_image: "Futuristic cityscape at night"
model: "gemini-3.1-flash-image"
aspectRatio: "16:9"
imageSize: "4K"
generate_speech
Generate speech from text using Gemini TTS models.
Parameters:
prompt(string, required): Text or transcript to synthesizemodel(string, optional): Speech model. Options:gemini-3.1-flash-tts-preview(default),gemini-2.5-flash-preview-tts,gemini-2.5-pro-preview-ttsvoiceName(string, optional): Prebuilt voice for single-speaker TTS. Default:KorelanguageCode(string, optional): BCP-47 language codespeakers(array, optional): Exactly two{ speaker, voiceName }entries for multi-speaker TTS
Behavior:
- Generated speech is saved to
GEMINI_SPEECH_OUTPUT_DIR(defaults to~/Music/gemini-generated/speech) - Returns MCP
audiocontent and saved file paths - Gemini TTS is text-only input; audio, image, and video reference files are not supported by
generate_speech
generate_music
Generate music using Lyria 3 models.
Parameters:
prompt(string, required): Music generation promptmodel(string, optional): Music model. Options:lyria-3-clip-preview(default),lyria-3-pro-previewoutputMimeType(string, optional): Vertex AI mode supportsaudio/mp3only. Gemini API/AI Studio mode supportsaudio/mp3, oraudio/wavwithlyria-3-pro-previewimagePaths(array, optional): Local image paths for multimodal music generation inputs (max 10). Supported file types: PNG (.png), JPEG (.jpg,.jpeg), WEBP (.webp), HEIC (.heic), HEIF (.heif)lyrics(string, optional): User-provided lyricsinstrumental(boolean, optional): Request instrumental-only output; cannot be combined withlyricsorvocalStylevocalStyle(string, optional): Vocal generation directionlanguage(string, optional): Output language direction. Options: English, German, Spanish, French, Hindi, Japanese, Korean, PortuguesedurationSeconds(number, optional): Target duration in seconds; requireslyria-3-pro-preview; max 184 secondsbpm(number, optional): Tempo direction in beats per minuteintensity(string, optional):low,medium, orhigh
Behavior:
- Generated music is saved to
GEMINI_MUSIC_OUTPUT_DIR(defaults to~/Music/gemini-generated/music) - Returns MCP
audiocontent, saved file paths, and any lyrics/song-structure text returned by Lyria - Lyria 3 Clip is fixed at 30 seconds; Lyria 3 Pro supports longer structured songs up to 184 seconds
- Lyria 3 output is 44.1 kHz, one clip per prompt; Vertex AI mode supports 192 kbps MP3 only, while Gemini API/AI Studio Pro can also request WAV
- Lyria 3 accepts text prompts and optional image references only; audio and video reference files are not supported by
generate_music; negative prompting is not supported
generate_video
Generate videos from text prompts using Veo video generation models.
Parameters:
prompt(string, required): Video generation prompt describing what to generatemodel(string, optional): Video model to use. Default:veo-3.1-fast-generate-001. Options:veo-3.1-fast-generate-001(default) ā fast video generationveo-3.1-generate-001ā standard quality generationveo-3.1-lite-generate-001ā cost-efficient generation
aspectRatio(string, optional): Video aspect ratio. Default:16:9. Options:16:9,9:16durationSeconds(string, optional): Video duration. Default:8. Options:4,6,8(1080p/4k require 8)resolution(string, optional): Video resolution. Default:720p. Options:720p,1080p,4k(1080p/4k require 8 second duration)generateAudio(boolean, optional): Generate audio for the video. Default:trueenhancePrompt(boolean, optional): Use Veo prompt rewriting/enhancementpersonGeneration(string, optional): Person generation control:allow_adult,dont_allownegativePrompt(string, optional): Description of what to exclude from the videoseed(number, optional): Random seed for reproducibilitynumberOfVideos(number, optional): Number of videos to generate. Default:1imagePath(string, optional): Local file path of input image for image-to-video generation. Supported file types: PNG (.png), JPEG (.jpg,.jpeg), WEBP (.webp)lastFramePath(string, optional): Local file path of last frame for interpolation (requiresimagePath). Same supported image file types asimagePathreferenceImagePaths(array, optional): Local file paths of reference images for style guidance (max 3, Veo 3.1 only). Same supported image file types asimagePathvideoPath(string, optional): Local file path of a Veo-generated 720p MP4 (.mp4) video to extend
Behavior:
- Generated videos are saved to
GEMINI_VIDEO_OUTPUT_DIR(defaults to~/Movies/gemini-generatedon macOS,~/Videos/gemini-generatedon Windows/Linux) generate_videoreturns an operation ID;check_videopolls the operation and returns saved file paths when complete- Supports text-to-video, image-to-video, interpolation, reference image, and Veo video extension modes
- Veo 3.1 Lite does not support
4kor reference asset images; model availability can differ between Vertex AI and Google AI Studio - Audio file references are not supported by
generate_video; describe dialogue, sound effects, and ambience inprompt
Examples:
# Simple text-to-video
generate_video: "A dancing robot in a cyberpunk city"
# Text-to-video with custom settings
generate_video: "Ocean waves crashing on a beach"
model: "veo-3.1-generate-001"
aspectRatio: "16:9"
durationSeconds: "8"
resolution: "1080p"
# Image-to-video (animation)
generate_video: "Animate this image"
imagePath: "/path/to/image.jpg"
# Interpolation (morph between two frames)
generate_video: "Smooth transition"
imagePath: "/path/to/start_frame.jpg"
lastFramePath: "/path/to/end_frame.jpg"
# Video with reference images for style
generate_video: "Generate a video with cyberpunk aesthetic"
referenceImagePaths: ["/path/to/style1.jpg", "/path/to/style2.jpg"]
# Extend a previous Veo-generated video
generate_video: "Follow the subject as the scene continues into the hallway"
videoPath: "/path/to/previous-veo-output.mp4"
resolution: "720p"
Security
The gemini-mcp-server implements comprehensive security measures to protect against common vulnerabilities. See SECURITY.md for complete documentation.
Defense Layers
1. SSRF (Server-Side Request Forgery) Protection
- HTTPS-only: HTTP requests are blocked; only HTTPS is allowed for web resources
- Private IP blocking: Blocks access to internal networks (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x)
- Cloud metadata blocking: Prevents access to AWS, GCP, Azure, and Alibaba Cloud metadata endpoints
- Redirect validation: All redirects are manually validated; cross-domain redirects are blocked
2. Prompt Injection Guardrails
- Trust boundaries: Clear separation between user input (trusted) and external content (untrusted)
- Content tagging: All fetched web content is wrapped in
<external_content>tags with security warnings - System prompt hardening: Built-in instructions to ignore malicious commands in external content
- Information disclosure protection: Guidelines prevent revealing system prompts or internal details
3. File Security (Multimodal Content)
- MIME type validation: Only known safe types (images, video, audio, PDF, code) are allowed
- Executable rejection: Blocks
.exe,.sh,.dll, and other executable file types - Path traversal prevention: All paths are normalized and validated against a whitelist
- Directory whitelist: Local files only allowed in safe directories (cwd, Documents, Downloads, Desktop)
- URI scheme validation: Only
gs://,https://, and conditionallyfile://URIs are allowed
4. Content Boundaries
- Size limits: Web content limited to 50KB to prevent resource exhaustion
- Content type validation: Basic validation of response content types
- Encoding validation: Proper handling of character encodings
Configuration
File Security (Multimodal)
# Default: false (secure) - file:// URIs are disabled
export GEMINI_ALLOW_FILE_URIS="false"
# For CLI environments only - enables local file:// URIs with whitelist validation
export GEMINI_ALLOW_FILE_URIS="true"
Security Note: Never enable GEMINI_ALLOW_FILE_URIS in production or web-facing applications. It's designed for trusted CLI environments only.
Security Monitoring
# Enable logging to monitor security events
export GEMINI_DISABLE_LOGGING="false"
export GEMINI_LOG_DIR="/var/log/gemini-mcp"
# Log to stderr for real-time monitoring
export GEMINI_LOG_TO_STDERR="true"
Best Practices
For Desktop Applications (Recommended)
{
"mcpServers": {
"gemini": {
"env": {
"GEMINI_ALLOW_FILE_URIS": "false"
}
}
}
}
For CLI Tools (Use with Caution)
export GEMINI_ALLOW_FILE_URIS="true"
export GEMINI_LOG_TO_STDERR="true"
Security Testing
Run comprehensive security test suite:
# All security tests
npx tsx test/url-security-test.ts # 21 tests - SSRF protection
npx tsx test/file-security-test.ts # 34 tests - File validation
npx tsx test/webfetch-security-test.ts # 5 tests - Content tagging
npx tsx test/security-guidelines-test.ts # 3 tests - Prompt injection
npx tsx test/multimodal-security-test.ts # 6 tests - Multimodal files
Total: 69 security-focused tests covering SSRF, path traversal, MIME validation, and prompt injection.
For detailed security information, threat models, and vulnerability reporting, see SECURITY.md.
Architecture
Agentic Loop
User Query
ā
āāāā Turn 1..10 Loop āāāā
ā ā
ā 1. Build Prompt ā
ā + Tool Definitions ā
ā + History ā
ā ā
ā 2. Gemini Generation ā
ā (with thinking) ā
ā ā
ā 3. Parse Response ā
ā - Reasoning? ā
ā - Tool Calls? ā
ā - Final Output? ā
ā ā
ā 4. Execute Tools ā
ā (parallel + retry) ā
ā ā
ā 5. Check MaxTurns ā
ā Continue or Exit? ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāā
ā
Final Result + Stats
Project Structure
src/
āāā agentic/ # Core agentic loop
ā āāā AgenticLoop.ts # Main orchestrator
ā āāā RunState.ts # Turn-based state management
ā āāā ResponseProcessor.ts # Parse Gemini responses
ā āāā Tool.ts # Tool interface (MCP standard)
ā
āāā mcp/ # MCP client implementation
ā āāā EnhancedMCPClient.ts # Unified stdio + HTTP client
ā āāā StdioMCPConnection.ts
ā āāā HttpMCPConnection.ts
ā
āāā tools/ # Tool implementations
ā āāā WebFetchTool.ts # Secure web fetching
ā āāā ToolRegistry.ts # Tool management + parallel execution
ā
āāā services/ # External services
ā āāā GeminiAIService.ts # Gemini API (with thinkingConfig, image generation)
ā
āāā handlers/ # MCP tool handlers
ā āāā QueryHandler.ts
ā āāā SearchHandler.ts
ā āāā FetchHandler.ts
ā āāā ImageGenerationHandler.ts # Image generation via Gemini image models
ā
āāā managers/ # Business logic
ā āāā ConversationManager.ts
ā
āāā errors/ # Custom error types
āāā types/ # TypeScript type definitions
āāā schemas/ # Zod validation schemas (including ImageGenerationSchema)
āāā config/ # Configuration loading
āāā utils/ # Shared utilities (Logger, security, imageSaver)
ā
āāā server/ # MCP server bootstrap
āāā GeminiAIMCPServer.ts
See DIRECTORY_STRUCTURE.md and ARCHITECTURE.md for details.
Advanced Usage
External MCP Servers
Connect to external MCP servers for extended capabilities:
Stdio (subprocess):
export GEMINI_MCP_SERVERS='[
{
"name": "filesystem",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./workspace"]
}
]'
HTTP:
export GEMINI_MCP_SERVERS='[
{
"name": "api-server",
"transport": "http",
"url": "https://api.example.com/mcp",
"headers": {"Authorization": "Bearer token"}
}
]'
Tools from external servers are automatically discovered and made available to the agent.
Reasoning Traces
Default: Console Logging
Logs are sent to stderr by default, making them visible in MCP client logs.
For File-Based Logging:
export GEMINI_LOG_TO_STDERR="false" # Disable console, use files
export GEMINI_LOG_DIR="./logs" # Log directory (default: ./logs)
Then check logs:
tail -f logs/general.log # All logs
tail -f logs/reasoning.log # Gemini thinking process only
To Disable All Logging:
export GEMINI_DISABLE_LOGGING="true"
Custom Tool Development
Tools follow MCP standard:
import { BaseTool, ToolResult, RunContext } from './agentic/Tool.js';
export class MyTool extends BaseTool {
name = 'my_tool';
description = 'Description for LLM';
parameters = {
type: 'object',
properties: {
arg: { type: 'string', description: 'Argument' }
},
required: ['arg']
};
async execute(args: any, context: RunContext): Promise<ToolResult> {
// Your implementation
return {
status: 'success',
content: 'Result'
};
}
}
Development
Build
npm run build
Watch Mode
npm run watch
Development Mode
npm run dev
Troubleshooting
MCP Server Connection Issues
If the MCP server appears to be "dead" or disconnects unexpectedly:
Check MCP client logs (logs are sent to stderr by default):
- macOS:
~/Library/Logs/Claude/mcp*.log - Windows:
%APPDATA%\Claude\Logs\mcp*.log
Server logs will appear in these files automatically.
Log Directory Errors
If you encounter errors like ENOENT: no such file or directory, mkdir './logs':
This should not happen with default settings (console logging is default).
If you enabled file logging (GEMINI_LOG_TO_STDERR="false"):
Solution: Use a writable log directory:
{
"mcpServers": {
"gemini": {
"command": "npx",
"args": ["-y", "github:mnthe/gemini-mcp-server"],
"env": {
"GOOGLE_CLOUD_PROJECT": "your-project-id",
"GEMINI_LOG_TO_STDERR": "false",
"GEMINI_LOG_DIR": "/tmp/gemini-logs"
}
}
}
}
Authentication Errors
- Verify credentials:
gcloud auth application-default login - Check project ID:
echo $GOOGLE_CLOUD_PROJECT - Enable Vertex AI API:
gcloud services enable aiplatform.googleapis.com
Tool Execution Failures
- Check logs in
logs/general.log(if logging is enabled) - Verify MCP server configurations in
GEMINI_MCP_SERVERS - Ensure external servers are running (for HTTP transport)
MaxTurns Exceeded
- Agent returns best-effort response after 10 turns
- Check if tools are repeatedly failing
- Review reasoning logs to understand loop behavior (if logging is enabled)
Documentation
- SECURITY.md - Security documentation and best practices
- ARCHITECTURE.md - System architecture and agentic loop design
- DIRECTORY_STRUCTURE.md - Code organization
- IMPLEMENTATION.md - Implementation details
- BUILD.md - Build and release process
- MULTIMODAL.md - Multimodal content guide
- PROMPT_CUSTOMIZATION.md - System prompt customization
- CONTRIBUTING.md - Contribution guidelines
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.