pywinauto-mcp
A Windows computer use agent — FastMCP server that gives AI assistants hands on the real desktop: windows, UI elements, mouse, keyboard, screenshots, OCR, shortcuts, dialogs, and outcome verification.
README
windows-computer-use-mcp
<p align="center"> <a href="https://github.com/sandraschi/windows-computer-use-mcp"><img src="https://img.shields.io/github/stars/sandraschi/windows-computer-use-mcp?style=flat-square" alt="Stars"></a> <a href="https://github.com/sandraschi/windows-computer-use-mcp/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="License"></a> <a href="https://python.org"><img src="https://img.shields.io/badge/Python-3.12+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"></a> <a href="https://fastmcp.com"><img src="https://img.shields.io/badge/FastMCP-3.2-7c5cfc?style=flat-square" alt="FastMCP"></a> </p>
A tool for agents, and an agent itself.
| You | It |
|---|---|
| Use it as an MCP server | Claude, Cursor, DeepSeek call automation_click, automation_screenshot, automation_ocr — 22 tools |
| Use it as an autonomous agent | Give it a goal: automation_mission(run="install app, verify UI, screenshot result") — it plans, executes, retries, and reports |
| Use it as a webapp | start.ps1 opens a React dashboard at http://127.0.0.1:10788 with HITL, crawler, logging |
| Use it as a desktop app | The NSIS installer bundles everything into one binary — no Python, no uv, no git needed |
Exhibit A: 100 Tauri/NSIS installers, one unattended run, $2 in LLM costs. Install, screenshot, verify, report — zero human intervention. That is what agentic Windows automation looks like at scale.
Built on pywinauto. Read docs/SAFETY.md before production use.
Quick Start
| Method | Command / Config |
|---|---|
| MCP stdio (Cursor, Claude Desktop) | { "mcpServers": { "windows-computer-use": { "command": "uv", "args": ["--directory", "<PATH>", "run", "windows-computer-use-mcp"] } } } |
| HTTP streamable (any MCP HTTP client) | { "mcpServers": { "windows-computer-use": { "url": "http://127.0.0.1:10789/mcp" } } } |
| Web operator UI | .\start.ps1 → http://127.0.0.1:10788 |
| Desktop app (NSIS installer) | Download from Releases — zero deps |
See INSTALL.md for detailed setup. Run just demo for examples.
Features
- Window Management — find, activate, maximize, minimize, position, close
- Mouse & Keyboard — click, drag, type, hotkeys, app shortcuts
- UI Elements — inspect, click, read text, verify state via UIA / Win32
- Visual Intelligence — screenshots, OCR, template matching
- Autonomous Missions — give it a goal, it plans and executes with retry + verification
- Macro Recording — record any UI sequence, replay, verify outcomes
- Multi-App Workflows — chain actions across Notepad, Calc, Paint, or any Windows app
- Telemetry — every action logged to SQLite; query failure patterns by tool
- Adaptive Location — auto-cascades through title/auto_id/control_id/class/OCR to find elements
- Face Recognition — optional, off by default
Documentation
| Doc | Content |
|---|---|
| INSTALL.md | Setup: desktop app, uv, MCP config |
| docs/README.md | Full documentation hub |
| docs/py-stack.md | Python dependency deep dive |
| docs/composing-with-playwright.md | Browser automation with Playwright MCP |
| docs/ocr.md | OCR system — Tesseract setup, limitations, competition |
| docs/cua-nsis-certification.md | Dogfooding: using the tool to test its own NSIS installer |
| docs/ROADMAP.md | Improvement roadmap short/medium/long term |
| docs/SAFETY.md | HITL, kill switch, opt-in features |
| docs/TOOLS.md | Portmanteau tool reference |
| tests/README.md | Test suite guide and e2e setup |
| examples/README.md | Runnable demos |
| mcpb/README.md | MCPB bundle packaging |
| web_sota/README.md | Operator UI build/dev guide |
| CHANGELOG.md | Release history |
Ports
| Port | Service |
|---|---|
| 10788 | Frontend — Vite operator UI |
| 10789 | Backend — FastAPI + FastMCP HTTP |
| stdio | MCP transport (port-free) |
Related
| Repo | What it does |
|---|---|
| autohotkey-mcp | Raw input recording/replay via AHK |
| browser-mcp | Playwright browser control — for webapps, HTML DOM, websites |
| virtualization-mcp | Sandbox / VM isolation |
| windows-operations-mcp | Registry, services, accounts |
Browser vs desktop: This server drives Win32 / UI Automation. For HTML/DOM and websites, pair with browser-mcp (Playwright). Both MCPs can run side by side — use one profile that loads both and let the LLM pick the right tool for the target.
Fleet standards: mcp-central-docs.
License
MIT — Copyright (c) 2026 Sandra Schipal.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.