Computer Use MCP
An MCP server for computer automation that provides tools for screenshots, mouse actions, keyboard input, and drag-and-drop functionality. It supports cross-platform desktop interaction for both Linux (X11) and Windows environments.
README
computer-use-mcp
MCP server for computer use automation. Provides screenshot, click, keyboard, mouse, and drag-and-drop tools. Supports Linux (X11) and Windows (10/11).
System Dependencies
Linux
sudo apt install xdotool scrot x11-xserver-utils
- xdotool — mouse/keyboard automation
- scrot — screenshots
- xrandr (from x11-xserver-utils) — screen resolution
Windows
No external dependencies required. Uses built-in PowerShell with .NET Framework:
- PowerShell — comes pre-installed on Windows 10/11
- user32.dll — native mouse/keyboard input via SendInput P/Invoke
- System.Drawing — screenshot capture
- System.Windows.Forms — screen resolution detection
Requirements:
- Windows 10 or 11
- .NET Framework (pre-installed)
- Desktop session must be active (screen unlocked)
Setup
pnpm install
pnpm build
Usage
With Claude Code
Add to ~/.claude/settings.json (Linux) or %USERPROFILE%\.claude\settings.json (Windows):
{
"mcpServers": {
"computer-use": {
"command": "node",
"args": ["/path/to/computer-use-mcp/dist/index.js"]
}
}
}
Development
pnpm dev
Tools
| Tool | Description |
|---|---|
screenshot |
Capture full screen or a region. Returns optimized JPEG base64. |
click |
Click at (x, y) with left/right/middle button. |
double_click |
Double-click at (x, y). |
type_text |
Type text at current cursor position. |
key_press |
Press key combinations (e.g. ctrl+c, alt+tab). |
mouse_move |
Move cursor without clicking. |
scroll |
Scroll up/down at a position. |
drag |
Drag and drop from one position to another. |
get_screen_size |
Get screen resolution. |
get_cursor_position |
Get current cursor position. |
wait |
Wait N milliseconds between actions. |
Platform Details
Linux
- Uses xdotool for mouse/keyboard, scrot for screenshots, xrandr for screen info
- Requires X11 display (Wayland not supported)
Windows
- Uses PowerShell with inline C# (Add-Type) calling user32.dll SendInput
- ~200-500ms overhead per operation due to PowerShell startup and Add-Type compilation
- Captures primary monitor only
typeTextuses KEYEVENTF_UNICODE for full Unicode supportdelay_msparameter ontype_textis ignored (SendInput sends all chars at once)- DPI-aware: calls SetProcessDPIAware before coordinate/screenshot operations
Notes
- Screenshots are automatically resized if wider than 1920px and compressed to JPEG quality 80
- A 100ms delay is applied after each action to avoid race conditions
- All actions are logged to stderr in JSON format
- Platform is auto-detected at startup via
process.platform
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.