MCP Servers

Wayland MCP Server

Enables AI assistants to automate Wayland desktop environments through screenshot analysis, mouse control, and keyboard input simulation. It supports visual context via VLM providers like Gemini and OpenRouter to perform complex, multi-step desktop actions.

README

Wayland MCP Server

Model Context Protocol server for Wayland desktop automation

Features • Installation • Usage • API • Security

</div>

Overview

Wayland MCP Server enables AI assistants to interact with your Wayland desktop through the Model Context Protocol. It provides screenshot capture with VLM analysis, mouse control, keyboard input, and action chaining capabilities.

Why This Project?

Existing Wayland screenshot and automation tools often have reliability issues. This project provides a robust, MCP-native solution specifically designed for AI-driven desktop automation on modern Linux systems.

Quick Example

# AI Assistant: "Take a screenshot and tell me what's on screen"
→ Captures screen, analyzes with VLM, responds with description

# AI Assistant: "Click the OK button"  
→ Identifies button location from screenshot, moves mouse, clicks

# AI Assistant: "Fill out this form with test data"
→ Chains clicks and keyboard input to complete form automatically

Features

Visual Analysis

Screenshot capture with precision ruler overlays
VLM-powered image analysis via OpenRouter or Google Gemini
Multiple vision model support (Claude, GPT-4V, Gemini, Qwen)
Side-by-side image comparison and diff detection

Mouse Automation

Absolute and relative cursor positioning
Click operations (left, right, middle button)
Drag and drop with coordinate precision
Bidirectional scrolling (vertical/horizontal)

Keyboard Control

Text input simulation
Individual key press events
Complex key combinations

Action Sequences

Chain multiple operations together
Flexible syntax: chain:action1;action2;action3
Example: chain:click:100,200;type:hello;press:Enter

Installation

Prerequisites

Python 3.8 or higher
Wayland compositor (GNOME, KDE Plasma, Hyprland, Sway, etc.)
grim and slurp for screenshots (usually pre-installed)

Quick Install

uvx wayland-mcp

From Source

git clone https://github.com/kurojs/wayland-mcp.git
cd wayland-mcp
pip install -e .

Input Control Setup

For mouse and keyboard automation, run the setup script:

sudo ./setup.sh

What it does:

Installs evemu-tools package
Configures setuid for evemu-event
Adds user to input group
Creates udev rules for device access

After setup, log out and back in for group changes to take effect.

Usage

MCP Configuration

The server supports two VLM providers:

Option 1: OpenRouter (multiple models via proxy)

{
  "mcpServers": {
    "wayland": {
      "command": "uvx",
      "args": ["wayland-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "sk-or-v1-...",
        "VLM_PROVIDER": "openrouter",
        "VLM_MODEL": "qwen/qwen2.5-vl-72b-instruct:free",
        "XDG_RUNTIME_DIR": "/run/user/1000",
        "WAYLAND_DISPLAY": "wayland-0"
      }
    }
  }
}

Option 2: Google Gemini Direct (native API, faster)

{
  "mcpServers": {
    "wayland": {
      "command": "uvx",
      "args": ["wayland-mcp"],
      "env": {
        "GEMINI_API_KEY": "AIza...",
        "VLM_PROVIDER": "gemini",
        "VLM_MODEL": "gemini-2.5-flash",
        "XDG_RUNTIME_DIR": "/run/user/1000",
        "WAYLAND_DISPLAY": "wayland-0"
      }
    }
  }
}

Example for Claude Desktop (~/.config/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "wayland": {
      "command": "uvx",
      "args": ["wayland-mcp"],
      "env": {
        "GEMINI_API_KEY": "AIza...",
        "VLM_PROVIDER": "gemini",
        "VLM_MODEL": "gemini-2.5-flash",
        "XDG_RUNTIME_DIR": "/run/user/1000",
        "WAYLAND_DISPLAY": "wayland-0"
      }
    }
  }
}

Note: See CONFIG_EXAMPLES.md for more configuration examples including Cursor, OpenRouter models, and VLM provider options.

Environment Variables

Variable	Description	Default	Required
VLM Provider Options
`VLM_PROVIDER`	Vision provider: `openrouter` or `gemini`	`openrouter`	No
`OPENROUTER_API_KEY`	OpenRouter API key	-	For OpenRouter
`GEMINI_API_KEY`	Google Gemini API key	-	For Gemini
`VLM_MODEL`	Model identifier	`qwen/qwen2.5-vl-72b-instruct:free` (OpenRouter) or `gemini-2.5-flash` (Gemini)	No
Wayland Environment
`XDG_RUNTIME_DIR`	Wayland runtime directory	`/run/user/1000`	Yes
`WAYLAND_DISPLAY`	Display identifier	`wayland-0`	Yes
Optional
`WAYLAND_MCP_PORT`	Server listen port	`4999`	No

Getting API Keys:

OpenRouter: openrouter.ai → Keys section
Google Gemini: Google AI Studio

Desktop Environment Compatibility

Desktop	Status	Notes
GNOME	✅ Tested	Wayland by default on modern versions
KDE Plasma	✅ Tested	Enable Wayland session at login
Hyprland	✅ Tested	Native Wayland compositor
Sway	✅ Should work	i3-compatible Wayland compositor
Others	⚠️ Untested	Any wlroots-based compositor should work

Example Commands

Through an MCP client, you can request actions like:

"Take a screenshot and analyze what's on the screen"
"Move the mouse to coordinates (100, 200) and click"
"Type 'hello world' and press Enter"
"Click at (50, 50), then drag to (200, 200)"

Available Tools

The server exposes the following MCP tools:

Screen Capture

capture_screenshot - Take a screenshot with optional ruler overlays
capture_and_analyze - Capture and analyze using VLM in one step

Vision Analysis

analyze_screenshot - Analyze an existing screenshot with custom prompt
compare_images - Compare two screenshots to detect differences

Mouse Control

move_mouse - Move cursor to coordinates (absolute or relative)
click_mouse - Perform left click at current position
drag_mouse - Drag between two coordinate points
scroll_mouse - Vertical scroll (positive=up, negative=down)

Action Execution

execute_action - Execute single action or chain multiple actions

Action Chain Syntax

Combine multiple actions with semicolons:

chain:action1;action2;action3

Supported Actions:

type:text - Type a text string
press:key - Press a specific key
click: or click:x,y - Click at position or current location
move_to:x,y - Move to absolute coordinates
move_to:rel:x,y - Move relative to current position
drag:x1,y1:x2,y2 - Drag from point to point
scroll:amount - Scroll vertically (typical values: 15-120)
scroll:horizontal:amount - Scroll horizontally

Example Chains:

chain:move_to:100,200;click:;type:hello;press:Enter
chain:click:50,50;drag:50,50:200,200
chain:scroll:120;move_to:rel:0,-50;click:

Security

⚠️ IMPORTANT SECURITY CONSIDERATIONS

This server grants extensive control over your desktop environment:

Full mouse and keyboard control
Screen capture capabilities
Ability to execute arbitrary input sequences

Best Practices

Only use with trusted AI models and MCP clients
Review action chains before execution in sensitive contexts
Consider running in a sandboxed or test environment
Be aware that the AI can perform any action you could perform manually

Permission Model

The setup script requires sudo access to:

Install system packages (evemu-tools)
Modify file permissions
Configure udev rules

After setup, the server runs with your user privileges but can control input devices through configured permissions.

Architecture

                    ┌─────────────────────────────────┐
                    │      MCP Client Layer           │
                    │   (Claude, Cursor, VS Code)     │
                    └───────────────┬─────────────────┘
                                    │
                            MCP Protocol (stdio/HTTP)
                                    │
                    ┌───────────────▼─────────────────┐
                    │    Wayland MCP Server           │
                    │    ┌─────────────────────┐      │
                    │    │  Core Components    │      │
                    │    ├─────────────────────┤      │
                    │    │ • FastMCP Handler   │      │
                    │    │ • Action Processor  │      │
                    │    │ • Chain Parser      │      │
                    │    └─────────────────────┘      │
                    └────┬────────────────┬────────────┘
                         │                │
         ┌───────────────┴────┐      ┌────┴──────────────┐
         │                    │      │                   │
    ┌────▼─────┐      ┌──────▼───┐  │  ┌──────────────┐ │
    │  Vision  │      │  Input   │  │  │   Screen     │ │
    │          │      │ Control  │  │  │   Capture    │ │
    ├──────────┤      ├──────────┤  │  ├──────────────┤ │
    │ • VLM    │      │ • evemu  │  │  │ • grim       │ │
    │ • Compare│      │ • Mouse  │  │  │ • slurp      │ │
    │          │      │ • Keyboard│  │  │ • PIL        │ │
    └──────────┘      └──────────┘  │  └──────────────┘ │
                                    │                    │
                                    └────────────────────┘
                                      Wayland Compositor

Troubleshooting

Input control not working

Ensure you ran sudo ./setup.sh
Log out and back in after setup
Verify you're in the input group: groups | grep input

Screenshots failing

Check if grim is installed: which grim
Verify WAYLAND_DISPLAY matches your session: echo $WAYLAND_DISPLAY

VLM analysis not working

Confirm OPENROUTER_API_KEY is set correctly
Check API key permissions on OpenRouter dashboard
Test model availability: some models have usage limits

Server won't start

Check Python version: python3 --version (needs 3.8+)
Verify all dependencies: pip install -e .
Look for port conflicts if using custom WAYLAND_MCP_PORT

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Project Structure

wayland-mcp/
├── wayland_mcp/          # Main package
│   ├── server_mcp.py     # MCP server implementation
│   ├── screen_utils.py   # Screenshot & VLM analysis
│   ├── mouse_utils.py    # Mouse control functions
│   ├── keyboard_utils.py # Keyboard input handling
│   ├── chain_processor.py# Action chain parser
│   └── ...
├── README.md             # This file
├── CONFIG_EXAMPLES.md    # Configuration examples
├── CONTRIBUTING.md       # Contribution guidelines
├── setup.sh              # Permission setup script
└── pyproject.toml        # Package metadata

License

GPL-3.0 License - See LICENSE for details.

Acknowledgments

Built on the Model Context Protocol
Uses FastMCP for server implementation
Inspired by the need for reliable Wayland automation tools

<div align="center"> Made for the Wayland desktop environment </div>

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

E2B

Using MCP to run code via e2b.

Official

Featured