UI Debugger MCP

UI Debugger MCP

Enables AI agents to autonomously debug UIs by delegating high-level stories to a small agent that drives browsers or desktop apps and reports structured pass/fail findings with evidence.

Category
Visit Server

README

UI Debugger MCP

An MCP server that debugs UIs autonomously — so the AI that wrote your app can also test it, without a human clicking through every flow.

The problem

AI coding agents (Claude, etc.) are great at writing code. They're bad at knowing if the UI actually works. For backend code there are unit and integration tests. For UI, a human still has to open the app, log in, click around, and report what's broken. That human-in-the-loop is slow, boring, and the main bottleneck when an entire product is built by AI.

The idea

Eliminate the human from the UI-debug loop with an MCP server.

  • A smart agent (Claude Code, Cursor, …) finishes a PR and wants to verify the UI.
  • It hands a story to this server: "on web, log in and do X, Y, Z — tell me if it breaks."
  • A small fast agent runs inside this server (via the Vercel AI SDK). It drives the browser or desktop, watches console + network, takes screenshots.
  • It reports structured findings back: pass/fail, what broke, evidence.
  • The smart agent fixes the code and asks again. Loop until the UI works.

Unlike playwright-mcp — where the smart model issues every single click itself — here the smart model stays high-level and delegates the whole clicking loop to the small agent.

How it's different from playwright-mcp

playwright-mcp UI Debugger MCP
Who clicks smart model, one action per call small agent, on its own
Tools exposed many (click, type, snapshot…) few (give a story, get findings)
Smart model cost high (chatty) low (high-level)
Output raw page state structured findings + evidence

Architecture — the three actors

Picture a boss, a fast blind driver, and a describer with eyes:

   ┌─────────────┐   MCP conversation    ┌──────────────────────────────────────┐
   │ smart agent │  start_debug ───────▶ │        UI Debugger MCP server         │
   │  (Claude)   │  send_message (live)  │                                       │
   │             │ ◀─────── get_findings │   ┌────────────┐     ┌────────────┐   │
   │ sets goals  │                       │   │  fast guy  │ look│ vision guy │   │
   │ fixes code  │                       │   │  (driver)  │────▶│  (eyes)    │   │
   │ loops       │                       │   │ deepseek   │◀────│  glm 5v    │   │
   └─────────────┘                       │   │ text·blind │ desc│ image      │   │
          ▲                              │   └─────┬──────┘     └────────────┘   │
          │ "works + looks nice"         │     observe / act (SQL-like)          │
          │ findings + screenshots       │         │ shared adapter contract     │
          └──────────────────────────────│─────────┼─────────────────────────────│
                                          └─────────┼─────────────────────────────┘
                                                    ▼
                              ┌──────────────┬──────────────┬──────────────┐
                              │  web (CDP)   │ desktop      │ android      │
                              │  browser     │ X11/Wayland  │ ADB          │
                              └──────────────┴──────────────┴──────────────┘
  • smart agent — the boss (Claude/caller). Sends a goal, reads findings, fixes the code, loops. Stays high-level — never clicks.
  • fast guy — the driver. Fast, cheap, text-only and blind. Runs the click loop on structure (DOM / a11y tree / view hierarchy). Default: deepseek.
  • vision guy — the eyes. Multimodal. The driver calls look to ask "does this look right? is the button centred?" and gets a description back. Default: glm. Spent only when visual judgment is needed.

One goal: the UI works and looks nice. Full design in docs/idea/.

Every run keeps its screenshots and stitches them into a short captioned replay video — Claude attaches it to the PR so a reviewer sees the flow working in ~10 seconds (docs/idea/workspace.md).

Targets

One project can expose several debug targets. A large app can have all three:

Target Protocol / how it's driven Reads
web CDP (Chrome DevTools Protocol), headless by default DOM
desktop X11 / Wayland input + AT-SPI a11y tree / vision
mobile ADB (uiautomator + screencap), Android view hierarchy / vision

Three adapters, one shared contract. Each runs managed (server launches the target) or attach (connect to a running one via cdpUrl / adbSerial). Linux first. iOS is out of scope on Linux (macOS-only tooling).

Setup

Install like any local MCP server — one entry in your .mcp.json:

{
  "mcpServers": {
    "ui-debugger": {
      "command": "npx",
      "args": ["-y", "@developerz.ai/ui-debugger-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "OPENAI_BASE_URL": "https://openrouter.ai/api/v1"
      }
    }
  }
}

Then add a per-project .ui-debugger-mcp.json describing the app to debug (models, targets, urls). The fastest way is the init command:

npx @developerz.ai/ui-debugger-mcp init   # in your project root

ui-debugger-mcp init scaffolds a project for debugging (described in docs/idea/config.md):

  • creates the workspace dir ./tmp/ui-debugger-mcp/
  • writes a starter .ui-debugger-mcp.json (default deepseek/glm models, a web target stub) if one doesn't already exist
  • adds tmp/ to .gitignore
  • prints the .mcp.json snippet to paste (it never writes your API key)

Config files:

  • .mcp.jsonhow to launch the server (command + secret key). Gitignored.
  • .ui-debugger-mcp.jsonhow to debug this app (models, targets). Committed.

The server reads the current directory to pick the project session — open it in your repo and it debugs that repo.

Using it

It's a conversation, not a remote control — five fat tools, not one-per-click:

Tool What it does
start_debug Open a run: { target, goal, criteria?, timeout? }. The small agent drives autonomously. Returns { session_id }.
get_findings Poll status + structured findings (functional bugs + visual issues) + evidence. Long-poll with wait.
send_message Talk to the running agent mid-flight — add work, redirect, or answer a question.
describe List the configured targets + models for this project.
end_session Close the run, free the browser/profile.

A run is always time-capped: start_debug's timeout (seconds) overrides the default 300s, so a session can never hang forever — it auto-ends and frees the profile lock when the cap fires.

Typical loop from a smart agent:

start_debug { target: "web", goal: "log in and add item 3 to the cart" }
→ poll get_findings (wait) until status is passed | failed
→ read bugs[] + visual[] + summary, fix the code, start_debug again

You can also drive it headless from a script with claude -p — see docs/claude/SKILL.md for the CLI recipe (MCP config, allowed tools, output formats).

CLI — check or stop a run

The ui-debugger-mcp binary doubles as a control CLI for the active run (reads state.json, no API key needed):

ui-debugger-mcp status   # which run is active, server pid, verdict, finding counts
ui-debugger-mcp stop     # gracefully end the run (frees the browser + profile)

Stack

  • Bun + TypeScript (ships as npm, runs via npx/bunx)
  • Vercel AI SDK — the agent loop (fast driver + vision describer)
  • Any OpenAI-compatible router (OpenRouter default) — swap models per role. Defaults: deepseek (text) drives, glm (image) sees.
  • CDP for web, X11/Wayland for desktop, ADB for Android
  • stdio MCP transport

Status

Web target shipped. Desktop and Android adapters are pending — see docs/idea/ for design.

Docs

Credits / influences

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured