Where's Waldo Rick

Where's Waldo Rick

An MCP server that provides agentic vision capabilities for visual regression testing by capturing and comparing screenshots using Gemini Flash. It enables users to detect UI changes and conduct conversational investigations to distinguish between intended and unintended visual modifications.

Category
Visit Server

README

Where's Waldo Rick - Visual Regression MCP Server

A Model Context Protocol (MCP) server that brings agentic vision capabilities to Claude Code for visual regression testing using Gemini 3 Flash.

Overview

Never again have ambiguous conversations about visual changes. See exactly what changed, circled and annotated, with intended vs unintended change detection.

Problem Solved

  • Developer works for hours on UI changes
  • Build passes, code is "clean"
  • You open the app... same exact layout
  • You ask: "What specifically changed?"
  • Dev says: "We added 2 pixels to the card"
  • You ask: "Where? Top? Bottom? Inside the box? Around it?"
  • 😤 Wasted time, unclear communication

Solution

Where's Waldo Rick provides:

  1. Screenshot capture from multiple platforms (macOS, iOS Simulator, Web)
  2. Pixel-perfect comparison with configurable thresholds
  3. Agentic vision analysis using Gemini 3 Flash (iterative zoom/crop/annotate)
  4. Expected vs unintended change detection
  5. Conversational investigation ("Not that box, the child item")

Installation

Requirements

  • Python 3.10+
  • Gemini API key (free tier: 15 requests/minute)

Install from GitHub

# Install via uvx
uvx --from git+https://github.com/bretbouchard/gemini-vision-mcp wheres_waldo.server

# Or install locally
pip install -e .

Configure Claude Code

Add to your Claude Code MCP configuration (~/.claude/mcp.json or project-specific):

{
  "mcpServers": {
    "wheres-waldo-rick": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/bretbouchard/gemini-vision-mcp", "wheres_waldo.server"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Usage

Basic Workflow

# 1. Declare expected changes before work
/visual:prepare "Card padding increases by 2px, button moves to right"

# 2. Capture baseline screenshot
/visual:capture "Phase 3 - Before card update"

# 3. Development happens...

# 4. Capture current state
/visual:capture "Phase 4 - After card update"

# 5. Compare and see all changes
/visual:compare screenshots/phases/3-before.png screenshots/phases/4-after.png

MCP Tools

visual_capture

Capture a screenshot and store it for visual regression testing.

await visual_capture(
    name="Phase 3 - Before card update",
    platform="macos"  # auto, macos, ios, web
)

visual_prepare

Declare a baseline with expected changes before development.

await visual_prepare(
    phase="Phase 3 - Card Layout Update",
    expected_changes="Card padding increases by 2px, button moves to right"
)

visual_compare

Compare two screenshots with pixel-level precision and agentic vision.

await visual_compare(
    before_path="screenshots/phases/3-before.png",
    after_path="screenshots/phases/4-after.png",
    threshold=2  # 1px, 2px, or 3px
)

visual_cleanup

Clean up old screenshots and cache.

await visual_cleanup(retention_days=7)

Development

Setup

# Clone repository
git clone https://github.com/bretbouchard/gemini-vision-mcp
cd gemini-vision-mcp

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/
ruff check src/

Project Structure

src/wheres_waldo/
ā”œā”€ā”€ __init__.py
ā”œā”€ā”€ server.py          # MCP server with tool definitions
ā”œā”€ā”€ models/            # Pydantic domain models
ā”œā”€ā”€ services/          # Business logic (capture, compare, storage)
ā”œā”€ā”€ tools/             # MCP tool implementations
└── utils/             # Logging, hashing, path helpers

Roadmap

  • [x] Phase 1: Foundation (MCP server skeleton, types, storage)
  • [ ] Phase 2: Capture & Baselines (multi-platform screenshots)
  • [ ] Phase 3: Comparison Engine (OpenCV + Gemini integration) šŸ”„ HIGH RISK
  • [ ] Phase 4: Operations (caching, progressive resolution, reporting)
  • [ ] Phase 5: Polish (conversational investigation)

See ROADMAP.md for complete execution plan.

Contributing

Contributions welcome! Please read REQUIREMENTS.md and ROADMAP.md before contributing.

License

MIT License - See LICENSE file for details

Acknowledgments

Built with:


Generated with Claude Code via Happy

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured