MCP Servers

Linux Desktop MCP Server

Enables AI assistants to interact with native Linux desktop applications through AT-SPI2 accessibility interfaces. Provides semantic element targeting, natural language search, and automation capabilities (clicking, typing, keyboard shortcuts) across GTK, Qt, and Electron applications.

README

Linux Desktop MCP Server

Built with Claude Code - This entire MCP server was developed using Claude Code, Anthropic's AI-powered coding assistant. We're proud to showcase what's possible with AI-assisted development!

An MCP server that provides Chrome-extension-level semantic element targeting for native Linux desktop applications using AT-SPI2 (Assistive Technology Service Provider Interface).

Features

Semantic Element References: Just like Chrome extension's ref_1, ref_2 system
Role Detection: Identifies buttons, text fields, links, menus, etc.
State Detection: Tracks focused, enabled, checked, editable states
Natural Language Search: Find elements by description ("save button", "search field")
Cross-Platform Input: Works on X11, Wayland, and XWayland
GTK/Qt/Electron Support: Works with any application that exposes accessibility

Installation

System Dependencies

# Ubuntu/Debian
sudo apt install python3-pyatspi gir1.2-atspi-2.0 at-spi2-core

# For X11 input simulation
sudo apt install xdotool

# For Wayland input simulation (recommended)
# Install ydotool from source or your package manager
# Then start the daemon:
sudo ydotoold &

Python Package

# From PyPI
pip install linux-desktop-mcp

# Or from source
git clone https://github.com/yourusername/linux-desktop-mcp.git
cd linux-desktop-mcp
pip install -e .

Enable Accessibility

Ensure accessibility is enabled in your desktop environment:

GNOME: Settings → Accessibility → Enable accessibility features
KDE: System Settings → Accessibility
Most modern desktops have this enabled by default

Configuration

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "linux-desktop": {
      "command": "linux-desktop-mcp"
    }
  }
}

Or if installed from source:

{
  "mcpServers": {
    "linux-desktop": {
      "command": "python",
      "args": ["-m", "linux_desktop_mcp"]
    }
  }
}

Available Tools

`desktop_snapshot`

Capture the accessibility tree with semantic element references.

Parameters:
  app_name: str (optional) - Filter to specific application
  max_depth: int (default: 15) - Tree traversal depth

Returns:
  Tree of elements with ref_ids:
  - ref_1: [application] Firefox
    - ref_2: [frame] "GitHub - Mozilla Firefox"
      - ref_3: [button] "Back" (clickable)
      - ref_4: [entry] "Search or enter address" (editable, focused)

`desktop_find`

Find elements by natural language query.

Parameters:
  query: str - "save button", "search field", "menu containing File"
  app_name: str (optional)

Returns:
  Matching elements with refs, states, and actions

`desktop_click`

Click an element by reference or coordinates.

Parameters:
  ref: str - Element reference (e.g., "ref_5")
  element: str - Human description for logging
  coordinate: [x, y] - Fallback if no ref
  button: left|right|middle
  click_type: single|double
  modifiers: [ctrl, shift, alt, super]

`desktop_type`

Type text into an element.

Parameters:
  text: str - Text to type
  ref: str - Element to focus first (optional)
  element: str - Human description
  clear_first: bool - Ctrl+A, Delete before typing
  submit: bool - Press Enter after

`desktop_key`

Press keyboard keys/shortcuts.

Parameters:
  key: str - Key name (Return, Tab, Escape, a, etc.)
  modifiers: [ctrl, shift, alt, super]

`desktop_capabilities`

Check available automation capabilities.

Example Usage

Example 1: Navigating to a Website in Firefox

User: "Open GitHub in Firefox"

Claude uses:
1. desktop_snapshot(app_name="Firefox")
   → Returns UI tree with elements like:
     - ref_5: [entry] "Search or enter address" (editable, focused)
     - ref_12: [button] "Go" (clickable)

2. desktop_click(ref="ref_5", element="URL bar")
   → Clicks to focus the address bar

3. desktop_type(text="https://github.com", ref="ref_5", clear_first=True, submit=True)
   → Types the URL and presses Enter

Result: Firefox navigates to GitHub

Example 2: Saving a File in LibreOffice

User: "Save this document as 'report.odt'"

Claude uses:
1. desktop_key(key="s", modifiers=["ctrl"])
   → Opens the Save dialog

2. desktop_snapshot(app_name="LibreOffice")
   → Returns dialog elements including:
     - ref_8: [entry] "File name:" (editable)
     - ref_15: [button] "Save" (clickable)

3. desktop_type(text="report.odt", ref="ref_8", clear_first=True)
   → Types the filename

4. desktop_click(ref="ref_15", element="Save button")
   → Clicks Save

Result: Document saved as report.odt

Example 3: Searching in a Code Editor

User: "Search for 'TODO' comments in VS Code"

Claude uses:
1. desktop_find(query="search", app_name="Code")
   → Finds search-related elements

2. desktop_key(key="f", modifiers=["ctrl", "shift"])
   → Opens global search panel

3. desktop_snapshot(app_name="Code")
   → Returns search panel elements:
     - ref_22: [entry] "Search" (editable, focused)
     - ref_25: [checkbox] "Match Case"

4. desktop_type(text="TODO", ref="ref_22", submit=True)
   → Types search query and executes search

Result: VS Code shows all TODO comments across the project

Example 4: Window Targeting for Multi-Window Automation

User: "Help me copy data from the spreadsheet to the email"

Claude uses:
1. desktop_context(list_available=True)
   → Lists all available windows

2. desktop_target_window(app_name="LibreOffice Calc", color="green")
   → Targets spreadsheet with green border

3. desktop_target_window(app_name="Thunderbird", color="blue")
   → Targets email client with blue border

4. desktop_snapshot()
   → Only shows elements from targeted windows (reduced context)

5. [Proceeds with copy/paste operations between windows]

Result: Claude can efficiently work across multiple applications

Platform Support

Feature	X11	Wayland	XWayland
AT-SPI discovery	Full	Full	Full
Click by ref	Full	Full	Full
Type text	Full	Full	Full
ydotool input	Full	Full	Full
xdotool input	Full	No	Yes

Troubleshooting

"AT-SPI2 not available"

sudo apt install python3-pyatspi gir1.2-atspi-2.0 at-spi2-core

"AT-SPI2 registry not running"

Ensure accessibility is enabled in your desktop settings. You may need to log out and back in.

"No input backend available" (Wayland)

# Install and start ydotool daemon
sudo ydotoold &

Elements not showing up

Some applications may not expose accessibility information. Modern GTK3/4, Qt5/6, and Electron apps generally work well.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     MCP Protocol Layer                       │
│              (JSON-RPC over stdio, tool defs)                │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                  Reference Manager                           │
│          (ref_1, ref_2 mapping, lifecycle, GC)              │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┴───────────────────┐
          │                                       │
┌─────────────────────┐             ┌─────────────────────────┐
│   AT-SPI2 Backend   │             │   Input Backends        │
│     (pyatspi)       │             │ (ydotool/xdotool/wtype) │
└─────────────────────┘             └─────────────────────────┘

Contributing

This project was created with Claude Code and we warmly welcome contributions! Whether you want to:

Report bugs or request features
Submit pull requests
Fork and build your own version
Improve documentation

We're very open to help and collaboration. See CONTRIBUTING.md for guidelines.

Privacy Policy

Linux Desktop MCP is a local desktop automation tool that:

Runs entirely on your local machine - No data is transmitted to external servers
Does not collect any personal data - No analytics, telemetry, or usage tracking
Does not store credentials - All authentication and authorization is handled by your local system
Accesses only what you explicitly target - The accessibility tree is read only for windows/applications you interact with
No network connectivity required - The MCP server operates completely offline

The only data accessed is the accessibility tree information exposed by your desktop applications (UI element names, roles, and states), which is used solely for local automation and is not persisted or transmitted anywhere.

Contact: For privacy-related questions, open an issue on GitHub.

License

MIT - See LICENSE for details.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured