MCP Servers

gui-user

An MCP server for external computer-use that launches, observes, and interacts with any X11 application via AT-SPI2 accessibility tree and xdotool input injection.

README

gui-user

An MCP server for external computer-use. Launch, observe, and interact with any X11 application via AT-SPI2 accessibility tree and xdotool input injection.

Unlike in-process testing frameworks, gui-user works externally — it can drive compiled C++ Qt/QML apps, GTK apps, Electron apps, or anything that renders on X11.

Installation

1. System packages

# Debian/Ubuntu — required
sudo apt install xvfb xdotool at-spi2-core dbus imagemagick libgirepository1.0-dev

# Optional — for VNC observation of the headless display
sudo apt install x11vnc tigervnc-viewer

# Optional — for OCR-based text detection in screenshots
sudo apt install tesseract-ocr

2. Install gui-user

Clone the repo and install in development mode:

git clone <repo-url> gui-user
cd gui-user
pip install -e .

This puts gui-user-mcp on your $PATH as the MCP server entry point.

3. Configure Claude Code

Add gui-user as a user-scope MCP server (available in all projects):

claude mcp add gui-user -s user -- gui-user-mcp

Or for a single project only, run from the project directory:

claude mcp add gui-user -- gui-user-mcp

Alternatively, you can create .mcp.json in the project root (this is shared via source control):

{
  "mcpServers": {
    "gui-user": {
      "command": "gui-user-mcp"
    }
  }
}

Verify the server is connected:

claude mcp list

If using VS Code, reload the window (Ctrl+Shift+P → "Developer: Reload Window") after adding the server, then start a new conversation. Type /mcp in the chat panel to confirm gui-user appears.

Tools

Tool	Description
`launch_app(binary, args, env, working_dir, width, height, timeout, display_mode, display, vnc)`	Launch any binary under an isolated Xvfb display or a visible local X11 display
`close_app()`	Close the app (display session stays alive for reuse)
`stop_display()`	Tear down the display session (Xvfb, D-Bus, VNC)
`get_app_status()`	Check if app is running, get PID/exit code/stderr
`screenshot(output_path?)`	Capture screen as base64 PNG
`list_ui_elements(role?, name?, visible_only?)`	Enumerate AT-SPI accessibility tree
`find_element(text?, role?, index?)`	Find element by label/role
`get_element_info(text?, role?, at_x?, at_y?)`	Detailed element properties or coordinate lookup
`click(x, y, button?)`	Click at screen coordinates
`click_element(text?, role?, index?, button?)`	Find element and click its center
`double_click(x, y, button?)`	Double-click at coordinates
`double_click_element(text?, role?, index?, button?)`	Find element and double-click
`hover(x, y)`	Move mouse to coordinates
`hover_element(text?, role?, index?)`	Move mouse to element center
`type_text(text)`	Type text into focused widget
`press_key(key, modifiers?)`	Key press (e.g., `press_key("s", ["Ctrl"])`)
`wait_for_idle(timeout?)`	Wait for CPU usage to settle
`wait_for_element(text?, role?, timeout?)`	Poll until element appears
`batch_actions(actions)`	Execute a sequence of actions in one call (avoids per-action round-trips)

Example Workflow

# Launch any binary in the default isolated Xvfb session
launch_app(binary="/usr/bin/gnome-calculator")

# Launch on the operator's visible X11 desktop instead
launch_app(
    binary="/usr/bin/gnome-calculator",
    display_mode="local",
)

# Or target a specific local display explicitly
launch_app(
    binary="/usr/bin/gnome-calculator",
    display_mode="local",
    display=":1",
)

# Discover UI elements
list_ui_elements()

# Find and click a button by its visible label
click_element(text="7", role="button")
click_element(text="+", role="button")
click_element(text="3", role="button")
click_element(text="=", role="button")

# Type text
type_text(text="hello world")

# Keyboard shortcuts
press_key(key="s", modifiers=["Ctrl"])

# Screenshot
screenshot(output_path="/tmp/result.png")

# Clean up
close_app()

Architecture

AI Assistant (Claude)
    │ MCP Protocol (stdio)
    ▼
MCP Server (main.py)
    │ Orchestrates:
    ├── DisplayManager  (Xvfb/local X11 + D-Bus + AT-SPI)
    ├── ProcessManager  (binary launch/monitor)
    ├── AccessibilityTree (AT-SPI2 element discovery)
    ├── ScreenshotCapture (ImageMagick import)
    ├── InputController (xdotool mouse/keyboard)
    └── IdleWaiter      (CPU-based idle detection)
    │
    ▼
Target Application (any X11 binary)

Key Differences from qt-pilot

This project was forked from qt-pilot and redesigned:

	qt-pilot	gui-user
Target apps	Python/PySide6 only	Any X11 binary
Discovery	`objectName` (requires code changes)	AT-SPI accessibility tree (no code changes)
Interaction	In-process QTest	External xdotool
Architecture	Monkeypatch + socket IPC	External observation + input injection

Running Tests

python3 -m unittest tests.test_integration tests.test_local_display -v

Observing the Headless Display (VNC)

Pass vnc=True to launch_app to start a view-only VNC server alongside the Xvfb display. This lets the operator watch what the AI is doing without interfering.

launch_app(binary="my_app", vnc=True)
# Response includes: "vnc_display": "localhost:5900"

To connect, run from any terminal:

gui-user-view

This auto-detects the running x11vnc and opens a VNC viewer. If x11vnc isn't running yet, it starts one on the first Xvfb display it finds. You can also pass a specific port: gui-user-view 5902

To connect manually: vncviewer localhost:<port>

Requirements: sudo apt install x11vnc tigervnc-viewer

Helper Commands

These are installed on your $PATH by pip install:

Command	Description
`gui-user-view`	Auto-detect the running Xvfb display and open a VNC viewer. Starts x11vnc if needed.
`gui-user-stop`	Kill any lingering Xvfb, x11vnc, and at-spi2-registryd processes. Useful for cleanup after crashes or interrupted sessions.

The underlying shell scripts (view-display.sh, stop-display.sh) are also available in the repo root.

Display Lifecycle

The display session (Xvfb + D-Bus + VNC) persists across app restarts. This means:

launch_app() creates the display on first call, reuses it on subsequent calls
close_app() terminates only the app — the display and VNC stay alive
stop_display() tears down everything (Xvfb, D-Bus, VNC)

This lets the operator connect the VNC viewer once and watch across multiple app launch/close cycles.

Screenshot Gallery

Every screenshot() call auto-saves a timestamped PNG to .gui-user/screenshots/ in the current working directory. Browse this folder to review the full visual history of a session.

Local Display Mode

display_mode="local" reuses a real X11 display so the operator can watch the app while the MCP drives it.

This mode is opt-in. The default remains an isolated Xvfb session.
Local mode is intended for X11 or XWayland displays only.
Mouse, keyboard, and focus are shared with the operator, so runs are less deterministic.
width and height are ignored in local mode because the existing desktop geometry is reused.
For unattended or CI-style runs, prefer the default Xvfb mode.

License

MIT License - see LICENSE file.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured