WindowsPC-MCP

WindowsPC-MCP

Gives AI agents their own virtual display on Windows, enabling isolated clicking, typing, and screenshots without interfering with the user's desktop.

Category
Visit Server

README

WindowsPC-MCP

An MCP server that gives AI agents their own virtual display on Windows. The agent clicks, types, and screenshots on an isolated screen while you keep working on yours.

┌──────────────────────────────────────────────────────────────┐
│  Your Physical Monitors            │  Agent Virtual Screen    │
│                                    │                          │
│  You work here normally.           │  Agent works here.       │
│  Mouse, keyboard, apps —          │  Isolated coordinates.    │
│  all yours.                        │  Filtered shortcuts.      │
│                                    │  Own window space.        │
│                                    │                          │
│  ← Agent can READ all screens      │  ← Agent can only        │
│     for context                    │     WRITE to this one     │
└──────────────────────────────────────────────────────────────┘

Why?

When an AI agent controls your desktop directly:

  • You can't work while the agent works — every mouse move derails it
  • The agent can click your apps — a misplaced click hits your browser
  • No safety boundary — Alt+Tab or Win+D disrupts the session
  • Hard to recover — if the agent loses track, you restart from scratch

WindowsPC-MCP creates a virtual display using the Parsec Virtual Display Driver and confines the agent to it. The agent sees coordinates (0,0) to (1920,1080) on its own screen. Your monitors are untouched.

Control UI (new in v0.6.0)

When the server starts, a small control window opens on your desktop showing the agent's screen in real time:

Control UI

Buttons:

  • Pause / Resume — block / unblock agent input (enters HUMAN_OVERRIDE mode)
  • Join — temporarily switch your own input to the agent's desktop so you can interact with it directly; press again to leave
  • Screen ▾ — pick which DXGI output to view (when you have multiple virtual displays)
  • Stop — confirm dialog → ends the agent session

The status pill on the left shows the current input mode (AGENT_SOLO / HUMAN_OVERRIDE / etc).

If you don't want the UI — for CI, headless gateways, or pure stdio MCP setups — pass --no-ui:

windowspc-mcp --transport stdio --no-ui

Server state is still mirrored to ~/.windowsmcp/status.json (1 Hz) for external monitoring.

Requirements

  • Windows 10 or 11
  • Python 3.12 or later
  • Parsec VDD — auto-installed on first server run (triggers a one-time UAC prompt)

Install

git clone https://github.com/ShikeChen01/WindowsPC-MCP.git
cd WindowsPC-MCP
pip install -e .

Setup

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "windowspc-mcp": {
      "command": "python",
      "args": ["-m", "windowspc_mcp", "--transport", "stdio"]
    }
  }
}

Claude Code picks this up automatically — no restart needed.

Claude Desktop

Add to your claude_desktop_config.json (Settings > Developer > Edit Config):

{
  "mcpServers": {
    "windowspc-mcp": {
      "command": "windowspc-mcp",
      "args": ["--transport", "stdio"]
    }
  }
}

Restart Claude Desktop after saving.

Other MCP clients

WindowsPC-MCP supports two transports:

# stdio (for any MCP client that launches a subprocess)
windowspc-mcp --transport stdio

# SSE over HTTP (for network clients)
windowspc-mcp --transport sse --host localhost --port 8000

Connect your client to http://localhost:8000/sse for the SSE transport.

Quick Start

Once connected, the agent workflow looks like this:

1. CreateScreen()                          → virtual display appears
2. Screenshot(screen="agent")              → see what's on the agent screen
3. App(name="notepad")                     → launch an app (auto-moved to agent screen)
4. Snapshot()                              → screenshot + UI tree with labeled elements
5. Click(label=3)                          → click element #3 from the snapshot
6. Type(text="Hello from the agent")       → type into the focused element
7. DestroyScreen()                         → clean up when done

The agent always calls CreateScreen first. After that, Snapshot is the primary tool for understanding what's on screen — it returns a screenshot plus a numbered list of interactive elements that Click and Type can target by label.

Tools

23 tools organized by category. See docs/tools.md for the full reference with parameters and examples.

Screen Management

Tool Description
CreateScreen Create the agent's virtual display (1920x1080 default)
DestroyScreen Remove the virtual display and release resources
ScreenInfo List all monitors — agent screen is marked [AGENT]
RecoverWindow Find windows by title/pid/process and move them to the agent screen

Vision

Tool Description
Screenshot Capture a screenshot (agent screen, all screens, or by index)
Snapshot Screenshot + window list + interactive UI elements with labels

Input

Tool Description
Click Click at coordinates or by element label from Snapshot
Type Type text, optionally clicking a target first
Move Move the cursor (with optional drag)
Scroll Scroll vertically or horizontally
Shortcut Send keyboard shortcuts (dangerous ones like Alt+Tab are blocked)
Wait Pause execution for a given number of seconds

Batch Input

Tool Description
MultiSelect Click multiple positions in sequence
MultiEdit Click and type into multiple fields in sequence

Apps & System

Tool Description
App Launch an application (windows auto-moved to agent screen)
PowerShell Run a PowerShell command and return output
FileSystem Read, write, list, copy, move, delete files
Clipboard Get or set clipboard text
Process List or kill running processes
Registry Read, write, or list Windows registry values
Notification Show a Windows toast notification
Scrape Fetch a URL and return its text content
InputStatus Check the current input mode and agent capabilities

How Confinement Works

All tools pass through a confinement engine before executing:

  • READ tools (Screenshot, Snapshot) can see all monitors for context
  • WRITE tools (Click, Type, Scroll, Move) are bounds-checked to the agent screen — coordinates outside are rejected
  • UNCONFINED tools (PowerShell, FileSystem, Registry) have no spatial component
  • Shortcuts are filtered: global shortcuts (Alt+Tab, Win+D, Win+L) are blocked; application shortcuts (Ctrl+S, Ctrl+C) are allowed

The agent works in agent-relative coordinates — (0,0) is the top-left of its virtual display. The confinement engine translates to absolute Windows coordinates transparently.

Security & trust

The GUI confinement model bounds the agent's mouse and keyboard to its virtual display. It does not bound system access. The following tools are deliberate trust grants — an agent that calls them can affect anything the host user can affect, regardless of the virtual display.

  • PowerShell — runs arbitrary commands. Subject to the 120-second timeout, otherwise unrestricted. Treat the agent as having a shell on your machine.
  • FileSystem — read/write/delete on any path the user can access. Includes credentials, SSH keys, browser profiles, Startup folders, and so on.
  • Registry — read/write on any registry key the user can access. Persistence via HKCU\…\Run is a single tool call away.
  • Scrape — fetches http:// and https:// URLs only (file:// and other schemes are rejected), but can still probe RFC 1918 / cloud-metadata endpoints reachable from the host.
  • App — launches any executable on PATH, including powershell.exe, which sidesteps the PowerShell timeout.
  • Process — lists and kills processes by substring match. A coarse name like "svc" can terminate unrelated services.

If you cannot trust the agent with any of the above, run the server in a VM or a dedicated user account where these tools cannot do damage.

Troubleshooting

"Parsec VDD driver not found" The driver auto-installs on first run but requires admin privileges. If the UAC prompt was dismissed, run the server once from an elevated terminal:

windowspc-mcp --transport stdio

Virtual display doesn't appear After CreateScreen, check with ScreenInfo. If the display isn't listed, the VDD driver may not be installed correctly. Reinstall from parsec-vdd releases.

"Agent screen already exists" The previous session didn't clean up. Call DestroyScreen first, or restart the server — it auto-recovers persisted display state on startup.

App windows don't appear on the agent screen App waits up to 5 seconds for windows to appear and moves them automatically. Some apps take longer to launch. Use RecoverWindow(process_name="appname") to move windows that appeared after the timeout.

Screenshot returns a black image Some apps render with hardware acceleration that GDI capture can't see. Try maximizing the window or using a different app. The virtual display itself always captures correctly.

Blocked shortcut error Global shortcuts (Alt+Tab, Win+D, Ctrl+Alt+Del) are intentionally blocked to prevent the agent from disrupting your desktop session. Use application-level shortcuts instead.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured