windows-computer-use-mcp

windows-computer-use-mcp

Enables Claude to fully control a Windows desktop with native screen capture, low-level input injection, game-grade input, and play-testing capabilities.

Category
Visit Server

README

windows-computer-use-mcp

<p align="center"> <img src="docs/unreal-playtest.gif" width="820" alt="Claude play-testing an Unreal Engine project — entering Play-In-Editor and running up a ramp, driven through the MCP, with the full editor (toolbar, viewport, Outliner) visible the whole time"> </p>

<p align="center"><em>Claude play-testing a live <strong>Unreal Engine</strong> project — entering Play-In-Editor and running up a ramp, driven entirely through this MCP. The full editor stays on screen, so you can see it really is the desktop app.</em></p>

<details> <summary><strong>How that clip was made</strong> — one <code>play</code> call</summary>

A timed input script run at a cadence while the window is recorded (mouse-look is relative, keys are hardware scan codes, so the game responds). The raw input was exactly:

play(
    target="window:MyProject - Unreal Editor",
    script="""
        lmb 557 460
        wait 0.4
        look 150 0
        wait 0.15
        look 150 0
        wait 0.2
        down w
        wait 1.7
        tap space
        wait 0.7
        up w
        wait 0.4
    """,
)

That is: click the viewport to capture the mouse (lmb), turn to face the ramp (look, relative pixels), run forward up it (down wup w), and jump at the top (tap space).

</details>

A Model Context Protocol server that gives a Claude agent full control of the local Windows desktop — native screen capture, low-level input injection, video recording, and a play-test loop for driving games and apps.

Unlike Anthropic's sandboxed computer-use tool, this runs on the machine it controls: it reads the actual current displays (no resolution requests), is multi-monitor and per-monitor DPI aware, and injects input via SendInput scan codes so it works in games that ignore synthetic virtual-key events. Built for full Claude control — no security gating.

Why this exists

Anthropic's official computer use in the Claude Code CLI is a macOS-only research preview — Pro/Max only, interactive sessions only (not available with the -p flag). The cross-platform alternative is the Claude Desktop app. There is no official, non-Desktop computer-use for Windows: nothing you can drive headlessly from claude -p, from the API, or wire into an agent over MCP.

This server fills that gap. It's a standard MCP server, so it works on Windows in Claude Code (interactive and -p), in Claude Desktop, or from any MCP client / custom agent — with no plan gating — and it's tuned for what a Windows agent actually needs that the sandboxed cloud tool can't do: real multi-monitor capture, per-window GPU capture, game-grade input, and play-testing.

Tools

Tool What it does
screenshot See the screen: whole desktop, a display:N, a window (even occluded/DirectX via PrintWindow), or a region. Downscaled inline image + a coordinate frame for clicks.
act Do input, batched: left_click, type, key, scroll, drag, hold_key, paste, click_element (UIA, no pixels), mouse_move_relative (game look), … Coordinates are in the last screenshot's image space; the server maps them to physical pixels.
record Record N seconds → a single timestamped frame montage (not N images) + an mp4. Judge motion/animation/stutter.
play Drive a timed input script at a cadence while recording (scan codes + relative mouse). probe/until read telemetry per sample and stop early — the closed loop for play-testing.
window Find / focus / close / read (get_text via UI Automation + OCR) / click_element controls with no screenshot — token-cheap.
process Launch (incl. shell:true for URLs / ms-settings: / Store apps), kill, wait, run a shell command (real stdout), and wait on readiness (wait_for_window, wait_for_file).
system Monitor layout + DPI/scale, cursor position, clipboard get/set, and viewport management.

Coordinates & multi-monitor

Coordinates are physical pixels in virtual-desktop space (primary monitor's top-left is (0,0); monitors to the left/above are negative). You click in the image space of the last screenshot; the server maps that back to physical pixels (handling downscale, per-monitor offset, and DPI). Every screenshot returns a capture_id; act errors loudly if you click against a stale frame instead of mis-clicking. Use system displays to see the layout, then target a specific monitor with display:0 / display:primary|left|right.

Install

As a Claude Code / Desktop plugin (recommended)

/plugin marketplace add sshh12/claude-plugins
/plugin install windows-computer-use@shrivu-plugins

The plugin bootstraps a Python virtual environment and installs this package from GitHub on first run, then starts the MCP server automatically.

Standalone (project-local MCP)

Requires Python 3.10+ and (for video) ffmpeg on PATH.

pip install git+https://github.com/sshh12/windows-computer-use-mcp

Then add to your MCP client config (e.g. a project .mcp.json):

{
  "mcpServers": {
    "windows-computer-use": {
      "command": "python",
      "args": ["-m", "windows_computer_use"]
    }
  }
}

Development

python -m venv .venv
.venv\Scripts\python.exe -m pip install -e .
.venv\Scripts\python.exe tests\smoke_engine.py    # capture/input/display engine
.venv\Scripts\python.exe tests\smoke_server.py     # assembled MCP tool surface

MCP_OUTPUT_DIR overrides where screenshots/video are written (default: a client root → ~/Pictures/windows-computer-use%TEMP%).

Debugging

Set WCU_DEBUG_HTML_DIR to a directory and the server writes a per-session session_<stamp>.html that pretty-prints every tool call — arguments, result text, and the returned screenshots inline — so you can replay exactly what the agent saw and did:

<img src="https://github.com/user-attachments/assets/4395906b-cfe2-4e25-b560-261a9ebc782d" width="820" alt="Per-session debug HTML dump showing tool calls, arguments, results, and inline screenshots">

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured