computer-use-windows

computer-use-windows

A standalone MCP server for Windows desktop control, enabling screenshots, mouse and keyboard input, app launch, window/display management, and clipboard access via natural language.

Category
Visit Server

README

<div align="center"> <img src="./assets/hero.svg" alt="windows-computer-use-skill hero" width="100%" /> <h1>Windows Computer-Use Skill</h1> <p><strong>A top-level Windows skill with a bundled standalone runtime and MCP server.</strong></p> <p> <a href="https://github.com/wimi321/windows-computer-use-skill">GitHub</a> · <a href="https://clawhub.ai/wimi321/computer-use-windows">ClawHub</a> · <a href="./README.zh-CN.md">简体中文</a> · <a href="./README.ja.md">日本語</a> </p> </div>

Install From ClawHub

Published on ClawHub as computer-use-windows.

clawhub install computer-use-windows

Positioning

This repository is:

  • a top-level skill
  • a standalone Windows desktop-control runtime
  • a computer-use MCP server for agent ecosystems

It is packaged skill-first, not Claude-first, so the same runtime can be adapted for multiple agent products.

Why This Exists

The requirement is stricter than "wrap an existing install":

  • no dependency on a local Claude app
  • no private .node binaries
  • no extracted hidden assets
  • install the skill, build the server, and use it

This project follows that rule on Windows.

What You Get

  • top-level Windows computer-use skill
  • standalone MCP server for screenshots, mouse, keyboard, app launch, window/display mapping, and clipboard
  • public dependency chain only: Node.js + Python + pyautogui + mss + Pillow + psutil + pywin32
  • first-run runtime bootstrap: the server creates its own virtualenv and installs dependencies automatically
  • bundled skill install that copies the full project into ~/.codex/skills/computer-use-windows/project
  • extracted TypeScript tool layer wired to a Windows-native Python backend

Status

Implemented in this repository:

  • Windows Python helper and runtime bootstrap
  • display enumeration and screenshot pipeline
  • mouse, keyboard, drag, scroll, and clipboard primitives
  • frontmost app, app-under-point, running app, installed app, and window-display lookup paths
  • Windows-first skill packaging and bundled project payload
  • TypeScript build passing

Still recommended before production use:

  • validate on a real Windows host
  • exercise UAC, elevated apps, secure desktop, multi-monitor scaling, and focus edge cases

This session did not have a live Windows machine attached, so runtime behavior on Windows has been implemented but not end-to-end verified here.

What Was Fixed In 0.1.1

Version 0.1.1 fixes a Windows packaging regression in the shared system-key blocklist logic. The migrated file had a broken platform branch that could apply the wrong denylist when checking OS-level shortcuts.

That logic now uses the correct win32 blocklist again, and the fix has been synced into both the source tree and the bundled skill payload.

Architecture

flowchart LR
    A[Agent / MCP Client] --> B[windows-computer-use-skill]
    B --> C[Extracted TypeScript MCP tools]
    B --> D[Standalone Python bridge]
    D --> E[pyautogui]
    D --> F[mss + Pillow]
    D --> G[psutil + pywin32]
    E --> H[Mouse / Keyboard]
    F --> I[Screenshots]
    G --> J[Apps / Windows / Displays / Clipboard]

Install

1. Clone and install Node deps

git clone https://github.com/wimi321/windows-computer-use-skill.git
cd windows-computer-use-skill
npm install
npm run build

2. Start the server

node dist/cli.js

On first launch, the project will automatically:

  • create .runtime/venv
  • bootstrap pip if needed
  • install the Python runtime dependencies from runtime/requirements.txt

MCP Configuration

{
  "mcpServers": {
    "computer-use": {
      "command": "node",
      "args": [
        "C:/absolute/path/to/windows-computer-use-skill/dist/cli.js"
      ],
      "env": {
        "CLAUDE_COMPUTER_USE_DEBUG": "0",
        "CLAUDE_COMPUTER_USE_COORDINATE_MODE": "pixels"
      }
    }
  }
}

See examples/mcp-config.json.

Skill Install

This repo ships a top-level skill at skill/computer-use-windows.

Option A: Install from ClawHub

clawhub install computer-use-windows

PowerShell

powershell -ExecutionPolicy Bypass -File .\skill\computer-use-windows\scripts\install.ps1

Bash

bash skill/computer-use-windows/scripts/install.sh

After installation, the bundled project lives at:

%USERPROFILE%\.codex\skills\computer-use-windows\project

If CODEX_HOME is set, use that location instead.

Validation Matrix

Validated in this session:

  • npm run check
  • npm run build
  • Python syntax compile check for runtime/windows_helper.py
  • bundled skill source integrity checks
  • bundled project version sync checks
  • review of Windows-specific runtime paths for screenshots, clipboard, frontmost app, app enumeration, and window/display lookup

Not yet validated in this session:

  • real Windows GUI control
  • live screenshot capture on Windows
  • foreground-window enforcement against real Windows apps
  • UAC / admin-window transitions
  • mixed-DPI multi-monitor behavior

Runtime Notes

Permissions

Windows does not require the same Accessibility / Screen Recording prompts as macOS, but desktop control can still be limited by:

  • elevated windows when the agent is not elevated
  • UAC secure desktop transitions
  • session / remote desktop boundaries
  • app-specific anti-automation protections

Screenshot Filtering

This standalone runtime reports screenshotFiltering: none.

That means screenshot filtering is not compositor-native; gating still happens at the MCP layer.

Platform Scope

This repository is intentionally Windows-only.

Covered capabilities:

  • screenshots
  • mouse control
  • keyboard input
  • frontmost app inspection
  • installed / running app discovery
  • window-to-display mapping
  • clipboard access
  • app launch

Example Commands

npm run build
node dist/cli.js
node --input-type=module -e "import { callPythonHelper } from './dist/computer-use/pythonBridge.js'; console.log(await callPythonHelper('list_displays', {}));"

Repository Layout

src/
  computer-use/
    executor.ts
    hostAdapter.ts
    pythonBridge.ts
  vendor/computer-use-mcp/
runtime/
  windows_helper.py
  requirements.txt
skill/
  computer-use-windows/
examples/
assets/

Environment Flags

  • CLAUDE_COMPUTER_USE_DEBUG=1
  • CLAUDE_COMPUTER_USE_COORDINATE_MODE=pixels
  • CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE=1
  • CLAUDE_COMPUTER_USE_MOUSE_ANIMATION=1
  • CLAUDE_COMPUTER_USE_HIDE_BEFORE_ACTION=0

Roadmap

  • validate and harden on real Windows hardware
  • improve app identity and icon extraction on Windows
  • add automated Windows integration tests
  • publish signed release artifacts for easier distribution

License

MIT

Credits

This project preserves and adapts reusable TypeScript computer-use logic recovered from the Claude Code workflow, then replaces the missing private runtime with a fully standalone public Windows implementation.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured