Computer Use MCP Server
A production-grade macOS MCP server exposing 33 tools for full desktop automation, including mouse, keyboard, screenshot, clipboard, and window control.
README
Computer Use MCP Server
A production-grade macOS Computer Use MCP Server that exposes 33 tools across 10 categories for full desktop automation via the Model Context Protocol. Control mouse, keyboard, screenshots, clipboard, windows, and more from any MCP-compatible AI client.
Works with Claude Code, Cursor, VS Code, Windsurf, LM Studio, Ollama, llama.cpp, MLX, and any MCP-compatible tool.
Features
33 Tools Across 10 Categories
| Category | Tools | Description |
|---|---|---|
| Mouse (12) | mouse_click, left_click, right_click, middle_click, double_click, triple_click, left_mouse_down, left_mouse_up, mouse_move, mouse_drag, scroll, mouse_scroll |
Full mouse control with coordinate-based clicking, dragging with 20-step interpolation, directional scrolling |
| Keyboard (5) | key, hold_key, keyboard_type, keyboard_press, keyboard_hotkey |
Unified key combos (cmd+c), hold-for-duration, Unicode text typing, individual key press, modifier hotkeys |
| Screenshot (1) | take_screenshot |
Full-screen or region capture with Retina scaling, coordinate metadata, and configurable resolution |
| Display (3) | switch_display, zoom, list_displays |
Multi-monitor switching, high-res region zoom for reading small text, display enumeration |
| Clipboard (2) | read_clipboard, write_clipboard |
Read/write system clipboard via NSPasteboard |
| Window (2) | get_active_window, list_windows |
Frontmost window info, enumerate all visible windows with position/size |
| Screen (2) | get_screen_info, get_cursor_position |
Display dimensions, Retina scale, accessibility status, cursor coordinates |
| System (3) | open_application, wait, run_shell_command |
Launch apps by name, timed waits, shell command execution |
| Access (2) | request_access, list_granted_applications |
App permission tracking for session-based access control |
| Batch (1) | computer_batch |
Execute multiple actions in a single call - eliminates round-trip latency |
Quick Start
# 1. Clone
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
# 2. Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt # or: pip install mcp mss pillow pyobjc-framework-Quartz
# 3. Test
python3 __main__.py
The server communicates over stdio (stdin/stdout) using the MCP JSON-RPC protocol.
Installation
Prerequisites
- macOS (uses Quartz framework for input simulation)
- Python 3.10+
- Accessibility permissions (System Settings > Privacy & Security > Accessibility)
Install Dependencies
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install mcp>=1.26.0 mss pillow pyobjc-framework-Quartz
Verify Installation
python3 -c "
from server.computer_use_server import ComputerUseMCPServer
server = ComputerUseMCPServer()
tools = server._collect_all_tools()
print(f'Server OK - {len(tools)} tools registered')
"
Expected output: Server OK - 33 tools registered
Grant Accessibility Permission
The server needs macOS accessibility access to simulate mouse/keyboard input:
- Open System Settings > Privacy & Security > Accessibility
- Add your terminal app (Terminal, iTerm2, VS Code, etc.)
- Toggle the permission ON
Screenshot capture works without accessibility permission. Only mouse/keyboard tools require it.
Configuration for AI Coding Tools
The server uses stdio transport - it reads from stdin and writes to stdout. Every MCP client connects the same way: spawn the Python process and pipe stdio.
Claude Code
Edit ~/.claude/settings.json:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"],
"cwd": "/path/to/computer-use-mcp"
}
}
}
Then run /mcp in Claude Code to connect.
Cursor
Create .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
VS Code + GitHub Copilot
Create .vscode/mcp.json in your workspace:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
Windsurf
Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
JetBrains IDEs
Add via Settings > Tools > MCP Servers, using the same command/args pattern.
Zed
Add to your Zed settings (~/.config/zed/settings.json):
{
"language_models": {
"mcp_servers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
}
Cline / Continue.dev
Both support the standard MCP JSON config format. Add to their respective config files using the same command + args pattern shown above.
Recommended Models for Tool Calling (April 2026)
Per LM Arena rankings and real-world testing, these are the best models for MCP tool calling:
Top Open Source Models (LM Arena Elo)
| Rank | Model | Provider | Parameters | Highlights |
|---|---|---|---|---|
| 1 | GLM-5 | Zhipu AI | MoE | #1 open source (Elo 1451), 77.8% SWE-bench Verified |
| 2 | Kimi K2.5 | Moonshot AI | MoE | HumanEval 99.0, stable across 200-300 sequential tool calls |
| 3 | GLM-4.7 | Zhipu AI | MoE | HumanEval 94.2, AIME 2025 95.7, GPQA 85.7 |
| 4 | GLM-5.1 | Zhipu AI | 744B MoE / 40B active | MIT license, 200K context, 8+ hour continuous agentic sessions |
| 5 | Qwen 3.6 Plus | Alibaba | Dense | 1M context, native function calling, always-on CoT reasoning |
| 6 | Gemma 4 31B | 31B Dense | #3 Arena text, Apache 2.0, native tool calling, 256K context | |
| 7 | Llama 4 Scout | Meta | 17B active / 16 experts | 10M context window, multimodal, beats Gemini 2.0 Flash-Lite |
| 8 | Llama 4 Maverick | Meta | 17B active / 128 experts | Beats GPT-4o, best multimodal in class |
| 9 | Mistral Small 4 | Mistral AI | 119B MoE / 6B active | Unified instruct+reasoning+coding+vision, 256K context |
| 10 | Qwen 3.5 | Alibaba | Multiple sizes | Most stable tool calling, rarely hallucinates calls |
Best Models by Platform
Ollama (run locally via ollama pull <model>):
gemma4(E2B / E4B / 26B MoE / 31B Dense) — native function calling, best sub-32B for agentsqwen3.5/qwen3.6-plus— most stable tool calling, rarely drops parametersllama4(Scout / Maverick) — native multimodal + tools, 10M contextkimi-k2.5— 200+ sequential tool calls without driftglm-5.1— long-horizon agentic coding (8+ hours continuous)mistral-small4— unified model, 6B active, fastgranite4— enterprise-grade tool callingphi-4-mini— compact with function calling supportdeepseek-r1— strong reasoning + tool use
llama.cpp (GGUF format):
bartowski/Gemma-4-31B-IT-GGUF— best open weight for agentsbartowski/Qwen3.5-32B-Instruct-GGUF— stable tool callingbartowski/Llama-4-Scout-17B-GGUF— 10M context, multimodalbartowski/GLM-5.1-40B-GGUF— top open source coding- Any model with Jinja chat template + function calling support
MLX (Apple Silicon via mlx-community):
mlx-community/Gemma-4-31B-IT-4bit— best performance/quality on Apple Siliconmlx-community/Qwen3.5-32B-Instruct-4bit— stable tool callsmlx-community/Llama-4-Scout-17B-4bit— multimodal + toolsmlx-community/Mistral-Small-4-6B-4bit— fast, 6B active
LM Studio: All of the above models are available through LM Studio's model browser with native MCP host support.
Configuration for Local Model Frameworks
LM Studio
LM Studio has native MCP host support since v0.3.17.
- Open LM Studio > Settings > MCP
- Add a new MCP server with:
- Command:
/path/to/computer-use-mcp/.venv/bin/python3 - Args:
["/path/to/computer-use-mcp/__main__.py"]
- Command:
- Select a model with tool calling support:
- Top picks: Gemma 4 31B, Qwen 3.5/3.6, Llama 4 Scout, GLM-5.1, Mistral Small 4, Kimi K2.5
- The tools will appear in the chat interface
llama.cpp (Native MCP - March 2026+)
llama.cpp merged native MCP client support in March 2026 (PR #18655), adding a full agentic loop with MCP server management in the WebUI.
Start llama-server with MCP:
# Start with a top function-calling model (pick one)
llama-server --jinja -fa -hf bartowski/Gemma-4-31B-IT-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Qwen3.5-32B-Instruct-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Llama-4-Scout-17B-GGUF:Q4_K_M --port 8080
Then in the llama.cpp WebUI:
- Go to MCP Server Settings
- Add this server with command:
/path/to/.venv/bin/python3 /path/to/__main__.py - The 33 tools will be available in the agentic loop
Via llama-mcp-server bridge:
npm install -g llama-mcp-server
Configure in claude_desktop_config.json:
{
"mcpServers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
Supported models for tool calling: Gemma 4, Qwen 3.5/3.6, Llama 4 Scout/Maverick, GLM-5.1, Kimi K2.5, Mistral Small 4, Llama 3.3, DeepSeek R1, Granite 4, Phi-4-mini, Hermes 3, Functionary v3.
Ollama
Ollama does not have native MCP support yet, but several bridge solutions work:
Option A: MCP-Bridge (recommended)
MCP-Bridge acts as middleware between Ollama's OpenAI-compatible API and MCP servers.
git clone https://github.com/SecretiveShell/MCP-Bridge.git
cd MCP-Bridge
Configure config.json:
{
"inference_server": {
"base_url": "http://localhost:11434/v1",
"api_key": "ollama"
},
"mcp_servers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
Option B: ollama-mcp-bridge
git clone https://github.com/patruff/ollama-mcp-bridge.git
cd ollama-mcp-bridge
npm install && npm run build
Add the computer-use server to the bridge config.
Recommended Ollama models (April 2026):
gemma4:31b— best sub-32B for agents, native function callingqwen3.5:32b— most stable tool callingllama4:scout— 10M context, multimodal + toolskimi-k2.5— 200+ sequential tool calls without driftglm-5.1— long-horizon agentic (8+ hours continuous)mistral-small4— fast, 6B active paramsgranite4— enterprise tool calling
MLX / Apple Silicon
For Apple Silicon Macs, use vLLM-MLX for optimized local inference with MCP bridge:
Install vLLM-MLX:
pip install git+https://github.com/waybarrios/vllm-mlx.git
Start the inference server:
# Pick a model (top recommendations for tool calling)
vllm-mlx serve mlx-community/Gemma-4-31B-IT-4bit --port 8000
vllm-mlx serve mlx-community/Qwen3.5-32B-Instruct-4bit --port 8000
vllm-mlx serve mlx-community/Llama-4-Scout-17B-4bit --port 8000
Connect via MCP-Bridge:
{
"inference_server": {
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed"
},
"mcp_servers": {
"computer-use": {
"command": "/path/to/computer-use-mcp/.venv/bin/python3",
"args": ["/path/to/computer-use-mcp/__main__.py"]
}
}
}
Performance: M4 Max achieves ~402 tokens/sec on small models, ~1112 tokens/sec with continuous batching.
Alternative: oMLX provides a macOS menu bar app with MCP tool integration.
vLLM
vLLM has native MCP integration with GPU-optimized inference.
pip install vllm
vllm serve google/gemma-4-31b-it --port 8000 # or any tool-calling model
vllm serve Qwen/Qwen3.5-32B-Instruct --port 8000 # stable tool calling
vllm serve meta-llama/Llama-4-Scout-17B --port 8000 # multimodal + tools
Connect via MCP-Bridge using http://localhost:8000/v1 as the base URL.
Generic OpenAI-Compatible API
Any service exposing an OpenAI-compatible API (local or remote) can use this server through MCP-Bridge:
- Start your inference server (Ollama, llama.cpp, vLLM, MLX, TGI, etc.)
- Point MCP-Bridge at it with the
base_url - Add this server to MCP-Bridge's
mcp_serversconfig - MCP-Bridge intercepts API requests, enriches them with tool definitions, executes tool calls, and returns results
Tool Reference
Batch Operations — computer_batch
Execute multiple actions in a single call to eliminate round-trip latency:
{
"actions": [
{"action": "left_click", "coordinate": [100, 200]},
{"action": "type", "text": "Hello, world!"},
{"action": "key", "text": "Return"},
{"action": "wait", "duration": 1},
{"action": "screenshot"}
]
}
Supported actions: key, type, mouse_move, left_click, left_click_drag, right_click, middle_click, double_click, triple_click, scroll, hold_key, screenshot, cursor_position, left_mouse_down, left_mouse_up, wait
Mouse Tools
| Tool | Parameters | Description |
|---|---|---|
left_click |
coordinate: [x, y] |
Left-click at coordinates |
right_click |
coordinate: [x, y] |
Right-click (context menu) |
middle_click |
coordinate: [x, y] |
Middle-click (scroll wheel) |
double_click |
coordinate: [x, y] |
Double-click (select word) |
triple_click |
coordinate: [x, y] |
Triple-click (select line) |
mouse_click |
x, y, button, click_count |
General click with full control |
mouse_move |
x, y or coordinate: [x, y] |
Move cursor without clicking |
mouse_drag |
start_coordinate, coordinate |
Drag with 20-step interpolation |
left_mouse_down |
(none) | Press and hold left button |
left_mouse_up |
(none) | Release left button |
scroll |
coordinate, scroll_direction, scroll_amount |
Directional scroll (up/down/left/right) |
mouse_scroll |
amount, x, y |
Scroll wheel (positive=up, negative=down) |
Keyboard Tools
| Tool | Parameters | Description |
|---|---|---|
key |
text: "cmd+c", repeat |
Unified key press with modifiers joined by + |
hold_key |
text: "shift", duration |
Hold key for N seconds then release |
keyboard_type |
text |
Type text character by character (Unicode) |
keyboard_press |
key |
Press a single named key |
keyboard_hotkey |
keys: ["cmd", "c"] |
Press key combination as array |
Supported keys: return, tab, space, delete, escape, arrows (left, right, up, down), home, end, pageup, pagedown, f1-f12, a-z, 0-9, symbols.
Modifiers: cmd/command, shift, alt/option, ctrl/control, fn
Screenshot & Display Tools
| Tool | Parameters | Description |
|---|---|---|
take_screenshot |
region (optional), max_dimension |
Capture screen as base64 PNG with coordinate metadata |
zoom |
region: [x0, y0, x1, y1] |
High-res crop of last screenshot (for reading small text) |
switch_display |
display |
Switch active monitor for screenshots. Use "auto" for main. |
list_displays |
(none) | Enumerate all connected displays |
Other Tools
| Tool | Parameters | Description |
|---|---|---|
read_clipboard |
(none) | Read clipboard text |
write_clipboard |
text |
Write text to clipboard |
get_active_window |
(none) | Frontmost window app, title, position, size |
list_windows |
(none) | All visible windows |
get_screen_info |
(none) | Screen dimensions, Retina scale, accessibility status |
get_cursor_position |
(none) | Current cursor coordinates |
open_application |
name or app |
Launch macOS app by name |
wait |
duration |
Pause for N seconds (0-100) |
run_shell_command |
command, timeout |
Execute shell command |
request_access |
apps[], reason |
Register apps for session access control |
list_granted_applications |
(none) | List currently granted apps |
Architecture
computer-use-mcp/
├── __main__.py # Entry point (python -m or direct)
├── __init__.py # Package metadata
├── pyproject.toml # Dependencies & build config
├── .mcp.json # Universal MCP client config
└── server/
├── __init__.py # Re-exports all tool modules
├── computer_use_server.py # MCP Server class, tool registry, stdio transport
└── tools/
├── __init__.py # Exports all tool getters/handlers
├── access_tools.py # request_access, list_granted_applications
├── batch_tools.py # computer_batch (action orchestrator)
├── clipboard_tools.py # read/write clipboard (NSPasteboard)
├── display_tools.py # switch_display, zoom, list_displays
├── keyboard_tools.py # key, hold_key, type, press, hotkey (Quartz)
├── mouse_tools.py # 12 mouse tools (Quartz CGEvent)
├── screen_tools.py # screen info, cursor position (Quartz)
├── screenshot_tools.py # screenshot capture (mss + PIL)
├── system_tools.py # open app, wait, shell command
└── window_tools.py # active window, list windows (Quartz + AppKit)
How It Works
- Transport: stdio (JSON-RPC 2.0 over stdin/stdout)
- Tool Registry:
ComputerUseMCPServercollects tools from 10 category modules, maps tool names to handlers - Input Simulation: macOS Quartz
CGEventAPI for mouse/keyboard events posted tokCGHIDEventTap - Screenshots:
msslibrary for fast capture, PIL for resizing, base64 encoding - Coordinate System: All tools use logical screen coordinates (Retina-aware). The server handles physical-to-logical scaling automatically.
Coordinate Mapping
Screenshots include metadata for mapping image pixels to screen coordinates:
click_x = (pixel_x / image_width) * logical_screen_width
click_y = (pixel_y / image_height) * logical_screen_height
On Retina displays, logical coordinates differ from physical pixels. The server handles this transparently.
Troubleshooting
"Accessibility permission not granted"
Go to System Settings > Privacy & Security > Accessibility and add your terminal/IDE app.
Server fails to start
Ensure you're using the venv Python (not system Python):
/path/to/computer-use-mcp/.venv/bin/python3 __main__.py
Mouse/keyboard tools return errors but screenshots work
Screenshot capture doesn't need accessibility permission, but input simulation does. Grant accessibility access to the process running the server.
"ModuleNotFoundError: No module named 'server'"
The __main__.py adds its directory to sys.path automatically. If running as a module (python -m computer_use), set the cwd to the parent directory of computer_use/.
Multi-monitor: wrong screen captured
Use list_displays to see all monitors, then switch_display to select the correct one. Use switch_display("auto") to reset.
Contributing
Contributions are welcome! This server is designed to be extensible:
- Add new tools by creating a file in
server/tools/ - Define
get_*_tools()andhandle_*_tool()functions - Register in
server/computer_use_server.pytool_sources list - Update
server/tools/__init__.pyexports
Please ensure new tools follow the existing patterns for error handling and JSON response format.
License
MIT License - see LICENSE for details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.