MCP Servers

ControlMCP

An MCP server that enables LLMs to see and control a computer — screen capture, window management, mouse and keyboard automation — with a structured plan-execute workflow for complex desktop automation.

README

ControlMCP

😆You’re already a mature LLM, so you should learn to operate the computer by yourself. </br></br>🛠️MCP server for LLM-controlled computer operations — screen capture, window management, mouse & keyboard automation.

中文文档

Overview

ControlMCP is a Model Context Protocol (MCP) server that gives LLMs the ability to see and control a computer — take screenshots, manage windows, move/click the mouse, type on the keyboard, and chain all of these into complex automation workflows.

The repository also ships with a reusable agent skill at skills/computer-control/. It packages desktop-operation SOPs, shortcut guidance, JetBrains IDE workflows, and screenshot-to-click coordinate rules for agents that support skills.

Quick Start

Installation

install from source:

git clone https://github.com/nix18/ControlMCP.git
cd ControlMCP
pip install -e .

Launch

control-mcp

The server communicates over stdio (standard MCP transport). Configure your MCP client to connect to the control-mcp command.

MCP Client Configuration

Add to your MCP client config (e.g. Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "control-mcp": {
      "command": "control-mcp",
      "args": []
    }
  }
}

Tools (34 total)

Control Plane

Tool	Description
`plan_desktop_task`	Convert a vague desktop instruction into a structured plan
`execute_desktop_plan`	Run a structured plan through the guarded executor
`get_execution_status`	Query the current status of a high-level execution run
`confirm_sensitive_action`	Explicitly approve or reject a sensitive action
`recover_execution_context`	Rebuild context after shortcut misuse or UI drift
`record_workflow_experience`	Persist reusable workflow experience

Screen Capture

Tool	Description
`capture_screen`	Full screen or monitor screenshot
`capture_region`	Region screenshot (x, y, width, height)
`capture_scroll_region`	Stitch a long screenshot while scrolling inside a fixed region
`get_screen_info`	List all monitors with resolution
`read_screenshot_base64`	Read a screenshot file as Base64 text
`resolve_grid_target`	Convert a grid cell + anchor into precise screen coordinates
`click_grid_target`	Resolve screenshot grid metadata and click directly

Window Management

Tool	Description
`list_windows`	List all visible windows
`find_windows`	Find windows by title substring
`focus_window`	Bring a window to the foreground
`capture_window`	Focus + screenshot a specific window

Mouse Control

Tool	Description
`mouse_click`	Click at coordinates (single/double/multi/hold)
`mouse_drag`	Drag from point A to point B
`mouse_move`	Move cursor without clicking
`mouse_position`	Get current cursor position
`mouse_scroll`	Scroll wheel up/down

Keyboard Control

Tool	Description
`key_press`	Press keys or hotkey combinations
`key_hold`	Hold keys for a duration
`key_type`	Type text character by character
`key_sequence`	Execute a timed sequence of key actions

Combined Operations

Tool	Description
`mouse_and_keyboard`	Execute a mixed sequence of mouse + keyboard + wait + screenshot actions

Additional Actions

Tool	Description
`clipboard_get`	Get clipboard text
`clipboard_set`	Set clipboard text
`launch_app`	Launch an application
`launch_url`	Open a URL in the browser
`wait`	Pause for N seconds
`get_pixel_color`	Get RGB color at screen coordinates
`hotkey`	Press a keyboard shortcut

Examples

See docs/TUTORIAL.md for comprehensive usage examples.

// Plan a vague desktop task first
{"tool": "plan_desktop_task", "args": {"instruction": "Switch to PyCharm and run the current config"}}

// Execute a generated plan
{"tool": "execute_desktop_plan", "args": {"plan_id": "plan_abc123"}}

// Take a screenshot
{"tool": "capture_screen", "args": {}}

// Take a sharper screenshot when text clarity matters
{"tool": "capture_window", "args": {"title": "PyCharm", "quality": 75, "sharpen": true}}

// Read that screenshot as Base64 text for non-multimodal models
{"tool": "read_screenshot_base64", "args": {"file_path": "/tmp/screen.jpg"}}

// Click at (500, 300)
{"tool": "mouse_click", "args": {"x": 500, "y": 300}}

// Combined: click → select all → type
{"tool": "mouse_and_keyboard", "args": {"actions": [
    {"action": "click", "x": 500, "y": 300},
    {"action": "key_press", "keys": ["ctrl", "a"]},
    {"action": "key_type", "text": "New text"}
]}}

Rebuilt Workflow

ControlMCP now supports a control-plane-first workflow for higher precision desktop automation:

Normalize the user instruction with plan_desktop_task
Review or directly execute the structured plan
Let the guarded executor choose a faster observation strategy (capture_window / capture_region / capture_scroll_region)
Verify each critical step and recover when context is lost
Require explicit confirmation for payment/password/asset-related actions
Save successful workflow experience for future runs

For small or visually ambiguous targets, you can also ask capture_screen, capture_region, or capture_window to generate a second grid_file_path overlay image with grid_rows and grid_cols, then convert a chosen cell + anchor through resolve_grid_target before clicking.

Documentation

Document	Description
README.md	This file
README.zh-CN.md	Chinese version of this file
docs/REQUIREMENTS.md	Requirements analysis
docs/ARCHITECTURE.md	Architecture design
docs/MODULE_DESIGN.md	Module design
docs/FUNCTIONAL_DESIGN.md	Functional design
docs/TUTORIAL.md	Tutorial & examples
skills/computer-control/	Agent Skill: computer operation SOPs
skills/computer-control/README.md	Skill-specific install and usage guide
skills/computer-control/docs/window-management.md	Window rescue and window shortcut reference
skills/computer-control/docs/idea-run-workflow.md	JetBrains IDE run/log observation workflow

Agent Skill

The skills/computer-control/ folder contains a ready-to-use Agent Skill that teaches LLMs how to operate computers proficiently.

What is included

SKILL.md: the main skill instructions, SOPs, shortcut tables, and common failure patterns
docs/coordinate-system.md: coordinate conversion reference for screenshot-to-click workflows
docs/window-management.md: window maximize/restore/snap shortcuts and window recovery workflow
docs/idea-run-workflow.md: JetBrains IDE startup, run-panel switching, and log stabilization workflow
README.md: skill-local installation and usage notes

What the skill covers

Keyboard-first automation: prefer shortcuts over UI clicking whenever possible
Plan-before-act control plane: normalize ambiguous instructions before touching the desktop
Window recovery: fix minimized, half-screen, or partially restored windows before further actions
Coordinate-safe clicking: convert screenshot-local coordinates into screen coordinates explicitly
IDE workflows: IntelliJ IDEA / PyCharm run-configuration selection, run-panel switching, and log monitoring
Sensitive-action gating: require confirmation before payment/password/asset-related steps
Operational fallback: when JetBrains shortcuts do not behave as expected, check the local ReferenceCard.pdf or JetBrains official documentation

Install the skill into your agent

You can either copy skills/computer-control/ into your agent's skill directory, or add it via a symbolic link.

Option 1: copy the directory

# Codex CLI
cp -r skills/computer-control ~/.codex/skills/

# Claude Code
cp -r skills/computer-control ~/.claude/skills/

# OpenCode
cp -r skills/computer-control ~/.config/opencode/skills/

Option 2: create a symbolic link

On macOS / Linux:

# Codex CLI
ln -s "$(pwd)/skills/computer-control" ~/.codex/skills/computer-control

# Claude Code
ln -s "$(pwd)/skills/computer-control" ~/.claude/skills/computer-control

# OpenCode
ln -s "$(pwd)/skills/computer-control" ~/.config/opencode/skills/computer-control

On Windows (Command Prompt as Administrator when required):

mklink /D "%USERPROFILE%\.codex\skills\computer-control" "%CD%\skills\computer-control"
mklink /D "%USERPROFILE%\.claude\skills\computer-control" "%CD%\skills\computer-control"
mklink /D "%USERPROFILE%\.config\opencode\skills\computer-control" "%CD%\skills\computer-control"

Using a symbolic link is convenient while iterating on the skill, because changes in this repository are reflected immediately in the agent's skills directory.

If your agent supports custom skill paths, you can also reference this folder directly.

Use the skill

After installation, invoke it naturally in prompts such as:

Use $computer-control to restart the IDEA app and wait until logs stop updating
Use $computer-control to maximize the target window and capture it
Use $computer-control to operate PyCharm with keyboard shortcuts first

For skill-specific details, see skills/computer-control/README.md.

Project Structure

ControlMCP/
├── README.md                          # This file
├── README.zh-CN.md                    # Chinese README
├── LICENSE                            # GNU GPLv3 license
├── pyproject.toml                     # Package config
├── src/
│   └── control_mcp/
│       ├── __init__.py
│       ├── server.py                  # MCP server + tool registration
│       ├── schemas/
│       │   ├── __init__.py
│       │   └── responses.py           # Structured response types
│       ├── tools/
│       │   ├── __init__.py
│       │   ├── screen.py              # Screen capture tools
│       │   ├── window.py              # Window management tools
│       │   ├── mouse.py               # Mouse control tools
│       │   ├── keyboard.py            # Keyboard control tools
│       │   ├── combined.py            # Combined operations
│       │   └── actions.py             # Additional actions
│       └── utils/
│           ├── __init__.py
│           ├── capture.py             # Capture utilities (JPEG, resize)
│           ├── _win_window.py         # Windows backend
│           ├── _mac_window.py         # macOS backend
│           └── _linux_window.py       # Linux backend
├── skills/
│   └── computer-control/              # Agent Skill: computer operation SOPs
│       ├── SKILL.md                   # Main skill instructions
│       ├── docs/
│       │   ├── coordinate-system.md   # Coordinate system reference
│       │   ├── window-management.md   # Window management reference
│       │   └── idea-run-workflow.md   # JetBrains IDE run/log workflow
│       └── README.md                  # Skill install & usage guide
├── docs/
│   ├── REQUIREMENTS.md
│   ├── ARCHITECTURE.md
│   ├── MODULE_DESIGN.md
│   ├── FUNCTIONAL_DESIGN.md
│   ├── TUTORIAL.md
│   └── zh-CN/                        # Chinese documentation
│       ├── REQUIREMENTS.md
│       ├── ARCHITECTURE.md
│       ├── MODULE_DESIGN.md
│       ├── FUNCTIONAL_DESIGN.md
│       └── TUTORIAL.md
└── tests/
    ├── __init__.py
    ├── test_schemas.py                # 22 tests
    ├── test_screen.py                 # 6 tests
    ├── test_window.py                 # 11 tests
    ├── test_mouse.py                  # 13 tests
    ├── test_keyboard.py               # 16 tests
    ├── test_combined.py               # 12 tests
    └── test_actions.py                # 13 tests

Platform Support

Platform	Screen Capture	Window Management	Mouse/Keyboard
Windows	✅ mss	✅ pygetwindow	✅ pyautogui
macOS	✅ mss	✅ Quartz	✅ pyautogui
Linux	✅ mss	✅ xlib	✅ pyautogui

License

GNU General Public License v3.0 (GPLv3)

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured