Moondream MCP Server

Moondream MCP Server

A powerful server that integrates the Moondream vision model to enable advanced image analysis, including captioning, object detection, and visual question answering, through the Model Context Protocol, compatible with AI assistants like Claude and Cline.

NightTrek

Image & Video Processing
Visit Server

README

🌙 Moondream MCP Server

A powerful Model Context Protocol (MCP) server that brings advanced image analysis capabilities to your applications using the Moondream vision model. This server seamlessly integrates with Claude and Cline, providing a bridge between AI assistants and sophisticated computer vision tasks.

This IS NOT an offical Moondream package. All credit to moondream.ai for making the best open source vision model that you can run on consumer hardware.

<div align="center" style="height: 150px; overflow: hidden; display: flex; align-items: center; margin: 20px 0;"> <img src="https://github.com/user-attachments/assets/e999ada0-9dfa-4f3d-a489-e4ce58434ecb" alt="Moondream MCP Banner" style="width: 100%; object-fit: cover;"> </div>

✨ Features

  • 🖼️ Image Captioning: Generate natural language descriptions of images
  • 🔍 Object Detection: Identify and locate specific objects within images
  • 💭 Visual Question Answering: Ask questions about image content and receive intelligent responses
  • 🚀 High Performance: Uses quantized 8-bit models for efficient inference
  • 🔄 Automatic Setup: Handles model downloading and environment setup
  • 🛠️ MCP Integration: Standardized protocol for seamless tool usage

🎯 Use Cases

  • Content Analysis: Automatically generate descriptions for image content
  • Accessibility: Create alt text for visually impaired users
  • Data Extraction: Extract specific information from images through targeted questions
  • Object Verification: Confirm the presence of specific objects in images
  • Scene Understanding: Analyze complex scenes and their components

🚀 Quick Start

Prerequisites

  • Node.js v18 or higher
  • Python 3.8+
  • UV package manager (automatically installed if not present)

Installation

  1. Clone and Setup
git clone <repository-url>
cd moondream-server
pnpm install
  1. Build the Server
pnpm run build

The server handles the rest automatically:

  • Creates Python virtual environment
  • Installs UV if not present
  • Downloads and sets up the Moondream model
  • Manages the model server process

Integration with Claude/Cline

Add to your MCP settings file (claude_desktop_config.json or cline_mcp_settings.json):

{
  "mcpServers": {
    "moondream": {
      "command": "node",
      "args": ["/path/to/moondream-server/build/index.js"]
    }
  }
}

🛠️ Available Tools

analyze_image

Powerful image analysis tool with multiple modes:

{
  "name": "analyze_image",
  "arguments": {
    "image_path": string,  // Path to image file
    "prompt": string       // Analysis command
  }
}

Prompt Types:

  • "generate caption" - Creates natural language description
  • "detect: [object]" - Finds specific objects (e.g., "detect: car")
  • "[question]" - Answers questions about the image

Examples:

// Image Captioning
{
  "image_path": "photo.jpg",
  "prompt": "generate caption"
}

// Object Detection
{
  "image_path": "scene.jpg",
  "prompt": "detect: person"
}

// Visual Q&A
{
  "image_path": "painting.jpg",
  "prompt": "What colors are used in this painting?"
}

🔧 Technical Details

Architecture

The server operates as a dual-component system:

  1. MCP Interface Layer

    • Handles protocol communication
    • Manages tool interfaces
    • Processes requests/responses
  2. Moondream Model Server

    • Runs the vision model
    • Processes image analysis
    • Provides HTTP API endpoints

Model Information

Uses the Moondream quantized model:

  • Default: moondream-2b-int8.mf.gz
  • Efficient 8-bit quantization
  • Automatic download from Hugging Face
  • ~500MB model size

Performance

  • Fast startup with automatic caching
  • Efficient memory usage through quantization
  • Responsive API endpoints
  • Concurrent request handling

🔍 Debugging

Common issues and solutions:

  1. Model Download Issues

    # Manual model download
    wget https://huggingface.co/vikhyatk/moondream2/resolve/main/moondream-0_5b-int4.mf.gz
    
  2. Server Port Conflicts

    • Default port: 3475
    • Check for process using: lsof -i :3475
  3. Python Environment

    • UV manages dependencies
    • Check logs in temp directory
    • Virtual env in system temp folder

🤝 Contributing

Contributions welcome! Areas of interest:

  • Additional model support
  • Performance optimizations
  • New analysis capabilities
  • Documentation improvements

📄 License

[Add your license information here]

🙏 Acknowledgments


<p align="center"> Made with ❤️ by Nighttrek </p>

Recommended Servers

Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
@kazuph/mcp-fetch

@kazuph/mcp-fetch

Model Context Protocol server for fetching web content and processing images. This allows Claude Desktop (or any MCP client) to fetch web content and handle images appropriately.

Featured
Local
JavaScript
mermaid-mcp-server

mermaid-mcp-server

A Model Context Protocol (MCP) server that converts Mermaid diagrams to PNG images.

Featured
JavaScript
mcp-pinterest

mcp-pinterest

A Pinterest Model Context Protocol (MCP) server for image search and information retrieval

Featured
TypeScript
DeepSRT MCP Server

DeepSRT MCP Server

An MCP server that enables users to generate summaries of YouTube videos in multiple languages and formats through integration with DeepSRT's API.

Official
JavaScript
ScreenshotOne MCP Server

ScreenshotOne MCP Server

An official MCP server implementation that allows AI assistants to capture website screenshots through the ScreenshotOne API, enabling visual context from web pages during conversations.

Official
TypeScript
Glif

Glif

Run AI workflows hosted on Glif.app via MCP, including ComfyUI-based image generators, meme generators, selfies, chained LLM calls, and more

Official
TypeScript
WebPerfect MCP Server

WebPerfect MCP Server

An intelligent MCP server with a fully automated batch pipeline for web-ready images. Features include noise reduction, auto levels/curves, JPEG artifact removal, 4K resizing, smart sharpening with shadow/highlight enhancement, and advanced WebP conversion.

Local
JavaScript
Stealth Browser MCP Server

Stealth Browser MCP Server

Provides stealth browser capabilities using Playwright with anti-detection techniques, allowing MCP clients to navigate websites and take screenshots while evading common bot detection systems.

Local
TypeScript
MCP-LOGO-GEN

MCP-LOGO-GEN

MCP Tool Server for Logo Generation. This server provides logo generation capabilities using FAL AI, with tools for image generation, background removal, and image scaling.

Local
Python