Wayland MCP Server
Enables AI assistants to automate Wayland desktop environments through screenshot analysis, mouse control, and keyboard input simulation. It supports visual context via VLM providers like Gemini and OpenRouter to perform complex, multi-step desktop actions.
README
Wayland MCP Server
<div align="center">
Model Context Protocol server for Wayland desktop automation
Features • Installation • Usage • API • Security
</div>
Overview
Wayland MCP Server enables AI assistants to interact with your Wayland desktop through the Model Context Protocol. It provides screenshot capture with VLM analysis, mouse control, keyboard input, and action chaining capabilities.
Why This Project?
Existing Wayland screenshot and automation tools often have reliability issues. This project provides a robust, MCP-native solution specifically designed for AI-driven desktop automation on modern Linux systems.
Quick Example
# AI Assistant: "Take a screenshot and tell me what's on screen"
→ Captures screen, analyzes with VLM, responds with description
# AI Assistant: "Click the OK button"
→ Identifies button location from screenshot, moves mouse, clicks
# AI Assistant: "Fill out this form with test data"
→ Chains clicks and keyboard input to complete form automatically
Features
Visual Analysis
- Screenshot capture with precision ruler overlays
- VLM-powered image analysis via OpenRouter or Google Gemini
- Multiple vision model support (Claude, GPT-4V, Gemini, Qwen)
- Side-by-side image comparison and diff detection
Mouse Automation
- Absolute and relative cursor positioning
- Click operations (left, right, middle button)
- Drag and drop with coordinate precision
- Bidirectional scrolling (vertical/horizontal)
Keyboard Control
- Text input simulation
- Individual key press events
- Complex key combinations
Action Sequences
- Chain multiple operations together
- Flexible syntax:
chain:action1;action2;action3 - Example:
chain:click:100,200;type:hello;press:Enter
Installation
Prerequisites
- Python 3.8 or higher
- Wayland compositor (GNOME, KDE Plasma, Hyprland, Sway, etc.)
grimandslurpfor screenshots (usually pre-installed)
Quick Install
uvx wayland-mcp
From Source
git clone https://github.com/kurojs/wayland-mcp.git
cd wayland-mcp
pip install -e .
Input Control Setup
For mouse and keyboard automation, run the setup script:
sudo ./setup.sh
What it does:
- Installs
evemu-toolspackage - Configures setuid for
evemu-event - Adds user to
inputgroup - Creates udev rules for device access
After setup, log out and back in for group changes to take effect.
Usage
MCP Configuration
The server supports two VLM providers:
Option 1: OpenRouter (multiple models via proxy)
{
"mcpServers": {
"wayland": {
"command": "uvx",
"args": ["wayland-mcp"],
"env": {
"OPENROUTER_API_KEY": "sk-or-v1-...",
"VLM_PROVIDER": "openrouter",
"VLM_MODEL": "qwen/qwen2.5-vl-72b-instruct:free",
"XDG_RUNTIME_DIR": "/run/user/1000",
"WAYLAND_DISPLAY": "wayland-0"
}
}
}
}
Option 2: Google Gemini Direct (native API, faster)
{
"mcpServers": {
"wayland": {
"command": "uvx",
"args": ["wayland-mcp"],
"env": {
"GEMINI_API_KEY": "AIza...",
"VLM_PROVIDER": "gemini",
"VLM_MODEL": "gemini-2.5-flash",
"XDG_RUNTIME_DIR": "/run/user/1000",
"WAYLAND_DISPLAY": "wayland-0"
}
}
}
}
Example for Claude Desktop (~/.config/Claude/claude_desktop_config.json):
{
"mcpServers": {
"wayland": {
"command": "uvx",
"args": ["wayland-mcp"],
"env": {
"GEMINI_API_KEY": "AIza...",
"VLM_PROVIDER": "gemini",
"VLM_MODEL": "gemini-2.5-flash",
"XDG_RUNTIME_DIR": "/run/user/1000",
"WAYLAND_DISPLAY": "wayland-0"
}
}
}
}
Note: See CONFIG_EXAMPLES.md for more configuration examples including Cursor, OpenRouter models, and VLM provider options.
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
| VLM Provider Options | |||
VLM_PROVIDER |
Vision provider: openrouter or gemini |
openrouter |
No |
OPENROUTER_API_KEY |
OpenRouter API key | - | For OpenRouter |
GEMINI_API_KEY |
Google Gemini API key | - | For Gemini |
VLM_MODEL |
Model identifier | qwen/qwen2.5-vl-72b-instruct:free (OpenRouter) or gemini-2.5-flash (Gemini) |
No |
| Wayland Environment | |||
XDG_RUNTIME_DIR |
Wayland runtime directory | /run/user/1000 |
Yes |
WAYLAND_DISPLAY |
Display identifier | wayland-0 |
Yes |
| Optional | |||
WAYLAND_MCP_PORT |
Server listen port | 4999 |
No |
Getting API Keys:
- OpenRouter: openrouter.ai → Keys section
- Google Gemini: Google AI Studio
Desktop Environment Compatibility
| Desktop | Status | Notes |
|---|---|---|
| GNOME | ✅ Tested | Wayland by default on modern versions |
| KDE Plasma | ✅ Tested | Enable Wayland session at login |
| Hyprland | ✅ Tested | Native Wayland compositor |
| Sway | ✅ Should work | i3-compatible Wayland compositor |
| Others | ⚠️ Untested | Any wlroots-based compositor should work |
Example Commands
Through an MCP client, you can request actions like:
- "Take a screenshot and analyze what's on the screen"
- "Move the mouse to coordinates (100, 200) and click"
- "Type 'hello world' and press Enter"
- "Click at (50, 50), then drag to (200, 200)"
Available Tools
The server exposes the following MCP tools:
Screen Capture
capture_screenshot- Take a screenshot with optional ruler overlayscapture_and_analyze- Capture and analyze using VLM in one step
Vision Analysis
analyze_screenshot- Analyze an existing screenshot with custom promptcompare_images- Compare two screenshots to detect differences
Mouse Control
move_mouse- Move cursor to coordinates (absolute or relative)click_mouse- Perform left click at current positiondrag_mouse- Drag between two coordinate pointsscroll_mouse- Vertical scroll (positive=up, negative=down)
Action Execution
execute_action- Execute single action or chain multiple actions
Action Chain Syntax
Combine multiple actions with semicolons:
chain:action1;action2;action3
Supported Actions:
type:text- Type a text stringpress:key- Press a specific keyclick:orclick:x,y- Click at position or current locationmove_to:x,y- Move to absolute coordinatesmove_to:rel:x,y- Move relative to current positiondrag:x1,y1:x2,y2- Drag from point to pointscroll:amount- Scroll vertically (typical values: 15-120)scroll:horizontal:amount- Scroll horizontally
Example Chains:
chain:move_to:100,200;click:;type:hello;press:Enter
chain:click:50,50;drag:50,50:200,200
chain:scroll:120;move_to:rel:0,-50;click:
Security
⚠️ IMPORTANT SECURITY CONSIDERATIONS
This server grants extensive control over your desktop environment:
- Full mouse and keyboard control
- Screen capture capabilities
- Ability to execute arbitrary input sequences
Best Practices
- Only use with trusted AI models and MCP clients
- Review action chains before execution in sensitive contexts
- Consider running in a sandboxed or test environment
- Be aware that the AI can perform any action you could perform manually
Permission Model
The setup script requires sudo access to:
- Install system packages (
evemu-tools) - Modify file permissions
- Configure udev rules
After setup, the server runs with your user privileges but can control input devices through configured permissions.
Architecture
┌─────────────────────────────────┐
│ MCP Client Layer │
│ (Claude, Cursor, VS Code) │
└───────────────┬─────────────────┘
│
MCP Protocol (stdio/HTTP)
│
┌───────────────▼─────────────────┐
│ Wayland MCP Server │
│ ┌─────────────────────┐ │
│ │ Core Components │ │
│ ├─────────────────────┤ │
│ │ • FastMCP Handler │ │
│ │ • Action Processor │ │
│ │ • Chain Parser │ │
│ └─────────────────────┘ │
└────┬────────────────┬────────────┘
│ │
┌───────────────┴────┐ ┌────┴──────────────┐
│ │ │ │
┌────▼─────┐ ┌──────▼───┐ │ ┌──────────────┐ │
│ Vision │ │ Input │ │ │ Screen │ │
│ │ │ Control │ │ │ Capture │ │
├──────────┤ ├──────────┤ │ ├──────────────┤ │
│ • VLM │ │ • evemu │ │ │ • grim │ │
│ • Compare│ │ • Mouse │ │ │ • slurp │ │
│ │ │ • Keyboard│ │ │ • PIL │ │
└──────────┘ └──────────┘ │ └──────────────┘ │
│ │
└────────────────────┘
Wayland Compositor
Troubleshooting
Input control not working
- Ensure you ran
sudo ./setup.sh - Log out and back in after setup
- Verify you're in the
inputgroup:groups | grep input
Screenshots failing
- Check if
grimis installed:which grim - Verify
WAYLAND_DISPLAYmatches your session:echo $WAYLAND_DISPLAY
VLM analysis not working
- Confirm
OPENROUTER_API_KEYis set correctly - Check API key permissions on OpenRouter dashboard
- Test model availability: some models have usage limits
Server won't start
- Check Python version:
python3 --version(needs 3.8+) - Verify all dependencies:
pip install -e . - Look for port conflicts if using custom
WAYLAND_MCP_PORT
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Project Structure
wayland-mcp/
├── wayland_mcp/ # Main package
│ ├── server_mcp.py # MCP server implementation
│ ├── screen_utils.py # Screenshot & VLM analysis
│ ├── mouse_utils.py # Mouse control functions
│ ├── keyboard_utils.py # Keyboard input handling
│ ├── chain_processor.py# Action chain parser
│ └── ...
├── README.md # This file
├── CONFIG_EXAMPLES.md # Configuration examples
├── CONTRIBUTING.md # Contribution guidelines
├── setup.sh # Permission setup script
└── pyproject.toml # Package metadata
License
GPL-3.0 License - See LICENSE for details.
Acknowledgments
- Built on the Model Context Protocol
- Uses FastMCP for server implementation
- Inspired by the need for reliable Wayland automation tools
<div align="center"> Made for the Wayland desktop environment </div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.