Computer Use MCP Server

Computer Use MCP Server

A production-grade macOS MCP server exposing 33 tools for full desktop automation, including mouse, keyboard, screenshot, clipboard, and window control.

Category
Visit Server

README

Computer Use MCP Server

Python MCP macOS License Tools

A production-grade macOS Computer Use MCP Server that exposes 33 tools across 10 categories for full desktop automation via the Model Context Protocol. Control mouse, keyboard, screenshots, clipboard, windows, and more from any MCP-compatible AI client.

Works with Claude Code, Cursor, VS Code, Windsurf, LM Studio, Ollama, llama.cpp, MLX, and any MCP-compatible tool.


Features

33 Tools Across 10 Categories

Category Tools Description
Mouse (12) mouse_click, left_click, right_click, middle_click, double_click, triple_click, left_mouse_down, left_mouse_up, mouse_move, mouse_drag, scroll, mouse_scroll Full mouse control with coordinate-based clicking, dragging with 20-step interpolation, directional scrolling
Keyboard (5) key, hold_key, keyboard_type, keyboard_press, keyboard_hotkey Unified key combos (cmd+c), hold-for-duration, Unicode text typing, individual key press, modifier hotkeys
Screenshot (1) take_screenshot Full-screen or region capture with Retina scaling, coordinate metadata, and configurable resolution
Display (3) switch_display, zoom, list_displays Multi-monitor switching, high-res region zoom for reading small text, display enumeration
Clipboard (2) read_clipboard, write_clipboard Read/write system clipboard via NSPasteboard
Window (2) get_active_window, list_windows Frontmost window info, enumerate all visible windows with position/size
Screen (2) get_screen_info, get_cursor_position Display dimensions, Retina scale, accessibility status, cursor coordinates
System (3) open_application, wait, run_shell_command Launch apps by name, timed waits, shell command execution
Access (2) request_access, list_granted_applications App permission tracking for session-based access control
Batch (1) computer_batch Execute multiple actions in a single call - eliminates round-trip latency

Quick Start

# 1. Clone
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp

# 2. Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt  # or: pip install mcp mss pillow pyobjc-framework-Quartz

# 3. Test
python3 __main__.py

The server communicates over stdio (stdin/stdout) using the MCP JSON-RPC protocol.


Installation

Prerequisites

  • macOS (uses Quartz framework for input simulation)
  • Python 3.10+
  • Accessibility permissions (System Settings > Privacy & Security > Accessibility)

Install Dependencies

git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install mcp>=1.26.0 mss pillow pyobjc-framework-Quartz

Verify Installation

python3 -c "
from server.computer_use_server import ComputerUseMCPServer
server = ComputerUseMCPServer()
tools = server._collect_all_tools()
print(f'Server OK - {len(tools)} tools registered')
"

Expected output: Server OK - 33 tools registered

Grant Accessibility Permission

The server needs macOS accessibility access to simulate mouse/keyboard input:

  1. Open System Settings > Privacy & Security > Accessibility
  2. Add your terminal app (Terminal, iTerm2, VS Code, etc.)
  3. Toggle the permission ON

Screenshot capture works without accessibility permission. Only mouse/keyboard tools require it.


Configuration for AI Coding Tools

The server uses stdio transport - it reads from stdin and writes to stdout. Every MCP client connects the same way: spawn the Python process and pipe stdio.

Claude Code

Edit ~/.claude/settings.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"],
      "cwd": "/path/to/computer-use-mcp"
    }
  }
}

Then run /mcp in Claude Code to connect.

Cursor

Create .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

VS Code + GitHub Copilot

Create .vscode/mcp.json in your workspace:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

JetBrains IDEs

Add via Settings > Tools > MCP Servers, using the same command/args pattern.

Zed

Add to your Zed settings (~/.config/zed/settings.json):

{
  "language_models": {
    "mcp_servers": {
      "computer-use": {
        "command": "/path/to/computer-use-mcp/.venv/bin/python3",
        "args": ["/path/to/computer-use-mcp/__main__.py"]
      }
    }
  }
}

Cline / Continue.dev

Both support the standard MCP JSON config format. Add to their respective config files using the same command + args pattern shown above.


Recommended Models for Tool Calling (April 2026)

Per LM Arena rankings and real-world testing, these are the best models for MCP tool calling:

Top Open Source Models (LM Arena Elo)

Rank Model Provider Parameters Highlights
1 GLM-5 Zhipu AI MoE #1 open source (Elo 1451), 77.8% SWE-bench Verified
2 Kimi K2.5 Moonshot AI MoE HumanEval 99.0, stable across 200-300 sequential tool calls
3 GLM-4.7 Zhipu AI MoE HumanEval 94.2, AIME 2025 95.7, GPQA 85.7
4 GLM-5.1 Zhipu AI 744B MoE / 40B active MIT license, 200K context, 8+ hour continuous agentic sessions
5 Qwen 3.6 Plus Alibaba Dense 1M context, native function calling, always-on CoT reasoning
6 Gemma 4 31B Google 31B Dense #3 Arena text, Apache 2.0, native tool calling, 256K context
7 Llama 4 Scout Meta 17B active / 16 experts 10M context window, multimodal, beats Gemini 2.0 Flash-Lite
8 Llama 4 Maverick Meta 17B active / 128 experts Beats GPT-4o, best multimodal in class
9 Mistral Small 4 Mistral AI 119B MoE / 6B active Unified instruct+reasoning+coding+vision, 256K context
10 Qwen 3.5 Alibaba Multiple sizes Most stable tool calling, rarely hallucinates calls

Best Models by Platform

Ollama (run locally via ollama pull <model>):

  • gemma4 (E2B / E4B / 26B MoE / 31B Dense) — native function calling, best sub-32B for agents
  • qwen3.5 / qwen3.6-plus — most stable tool calling, rarely drops parameters
  • llama4 (Scout / Maverick) — native multimodal + tools, 10M context
  • kimi-k2.5 — 200+ sequential tool calls without drift
  • glm-5.1 — long-horizon agentic coding (8+ hours continuous)
  • mistral-small4 — unified model, 6B active, fast
  • granite4 — enterprise-grade tool calling
  • phi-4-mini — compact with function calling support
  • deepseek-r1 — strong reasoning + tool use

llama.cpp (GGUF format):

  • bartowski/Gemma-4-31B-IT-GGUF — best open weight for agents
  • bartowski/Qwen3.5-32B-Instruct-GGUF — stable tool calling
  • bartowski/Llama-4-Scout-17B-GGUF — 10M context, multimodal
  • bartowski/GLM-5.1-40B-GGUF — top open source coding
  • Any model with Jinja chat template + function calling support

MLX (Apple Silicon via mlx-community):

  • mlx-community/Gemma-4-31B-IT-4bit — best performance/quality on Apple Silicon
  • mlx-community/Qwen3.5-32B-Instruct-4bit — stable tool calls
  • mlx-community/Llama-4-Scout-17B-4bit — multimodal + tools
  • mlx-community/Mistral-Small-4-6B-4bit — fast, 6B active

LM Studio: All of the above models are available through LM Studio's model browser with native MCP host support.


Configuration for Local Model Frameworks

LM Studio

LM Studio has native MCP host support since v0.3.17.

  1. Open LM Studio > Settings > MCP
  2. Add a new MCP server with:
    • Command: /path/to/computer-use-mcp/.venv/bin/python3
    • Args: ["/path/to/computer-use-mcp/__main__.py"]
  3. Select a model with tool calling support:
    • Top picks: Gemma 4 31B, Qwen 3.5/3.6, Llama 4 Scout, GLM-5.1, Mistral Small 4, Kimi K2.5
  4. The tools will appear in the chat interface

llama.cpp (Native MCP - March 2026+)

llama.cpp merged native MCP client support in March 2026 (PR #18655), adding a full agentic loop with MCP server management in the WebUI.

Start llama-server with MCP:

# Start with a top function-calling model (pick one)
llama-server --jinja -fa -hf bartowski/Gemma-4-31B-IT-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Qwen3.5-32B-Instruct-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Llama-4-Scout-17B-GGUF:Q4_K_M --port 8080

Then in the llama.cpp WebUI:

  1. Go to MCP Server Settings
  2. Add this server with command: /path/to/.venv/bin/python3 /path/to/__main__.py
  3. The 33 tools will be available in the agentic loop

Via llama-mcp-server bridge:

npm install -g llama-mcp-server

Configure in claude_desktop_config.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Supported models for tool calling: Gemma 4, Qwen 3.5/3.6, Llama 4 Scout/Maverick, GLM-5.1, Kimi K2.5, Mistral Small 4, Llama 3.3, DeepSeek R1, Granite 4, Phi-4-mini, Hermes 3, Functionary v3.

Ollama

Ollama does not have native MCP support yet, but several bridge solutions work:

Option A: MCP-Bridge (recommended)

MCP-Bridge acts as middleware between Ollama's OpenAI-compatible API and MCP servers.

git clone https://github.com/SecretiveShell/MCP-Bridge.git
cd MCP-Bridge

Configure config.json:

{
  "inference_server": {
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama"
  },
  "mcp_servers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Option B: ollama-mcp-bridge

git clone https://github.com/patruff/ollama-mcp-bridge.git
cd ollama-mcp-bridge
npm install && npm run build

Add the computer-use server to the bridge config.

Recommended Ollama models (April 2026):

  • gemma4:31b — best sub-32B for agents, native function calling
  • qwen3.5:32b — most stable tool calling
  • llama4:scout — 10M context, multimodal + tools
  • kimi-k2.5 — 200+ sequential tool calls without drift
  • glm-5.1 — long-horizon agentic (8+ hours continuous)
  • mistral-small4 — fast, 6B active params
  • granite4 — enterprise tool calling

MLX / Apple Silicon

For Apple Silicon Macs, use vLLM-MLX for optimized local inference with MCP bridge:

Install vLLM-MLX:

pip install git+https://github.com/waybarrios/vllm-mlx.git

Start the inference server:

# Pick a model (top recommendations for tool calling)
vllm-mlx serve mlx-community/Gemma-4-31B-IT-4bit --port 8000
vllm-mlx serve mlx-community/Qwen3.5-32B-Instruct-4bit --port 8000
vllm-mlx serve mlx-community/Llama-4-Scout-17B-4bit --port 8000

Connect via MCP-Bridge:

{
  "inference_server": {
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed"
  },
  "mcp_servers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Performance: M4 Max achieves ~402 tokens/sec on small models, ~1112 tokens/sec with continuous batching.

Alternative: oMLX provides a macOS menu bar app with MCP tool integration.

vLLM

vLLM has native MCP integration with GPU-optimized inference.

pip install vllm
vllm serve google/gemma-4-31b-it --port 8000       # or any tool-calling model
vllm serve Qwen/Qwen3.5-32B-Instruct --port 8000   # stable tool calling
vllm serve meta-llama/Llama-4-Scout-17B --port 8000 # multimodal + tools

Connect via MCP-Bridge using http://localhost:8000/v1 as the base URL.

Generic OpenAI-Compatible API

Any service exposing an OpenAI-compatible API (local or remote) can use this server through MCP-Bridge:

  1. Start your inference server (Ollama, llama.cpp, vLLM, MLX, TGI, etc.)
  2. Point MCP-Bridge at it with the base_url
  3. Add this server to MCP-Bridge's mcp_servers config
  4. MCP-Bridge intercepts API requests, enriches them with tool definitions, executes tool calls, and returns results

Tool Reference

Batch Operations — computer_batch

Execute multiple actions in a single call to eliminate round-trip latency:

{
  "actions": [
    {"action": "left_click", "coordinate": [100, 200]},
    {"action": "type", "text": "Hello, world!"},
    {"action": "key", "text": "Return"},
    {"action": "wait", "duration": 1},
    {"action": "screenshot"}
  ]
}

Supported actions: key, type, mouse_move, left_click, left_click_drag, right_click, middle_click, double_click, triple_click, scroll, hold_key, screenshot, cursor_position, left_mouse_down, left_mouse_up, wait

Mouse Tools

Tool Parameters Description
left_click coordinate: [x, y] Left-click at coordinates
right_click coordinate: [x, y] Right-click (context menu)
middle_click coordinate: [x, y] Middle-click (scroll wheel)
double_click coordinate: [x, y] Double-click (select word)
triple_click coordinate: [x, y] Triple-click (select line)
mouse_click x, y, button, click_count General click with full control
mouse_move x, y or coordinate: [x, y] Move cursor without clicking
mouse_drag start_coordinate, coordinate Drag with 20-step interpolation
left_mouse_down (none) Press and hold left button
left_mouse_up (none) Release left button
scroll coordinate, scroll_direction, scroll_amount Directional scroll (up/down/left/right)
mouse_scroll amount, x, y Scroll wheel (positive=up, negative=down)

Keyboard Tools

Tool Parameters Description
key text: "cmd+c", repeat Unified key press with modifiers joined by +
hold_key text: "shift", duration Hold key for N seconds then release
keyboard_type text Type text character by character (Unicode)
keyboard_press key Press a single named key
keyboard_hotkey keys: ["cmd", "c"] Press key combination as array

Supported keys: return, tab, space, delete, escape, arrows (left, right, up, down), home, end, pageup, pagedown, f1-f12, a-z, 0-9, symbols.

Modifiers: cmd/command, shift, alt/option, ctrl/control, fn

Screenshot & Display Tools

Tool Parameters Description
take_screenshot region (optional), max_dimension Capture screen as base64 PNG with coordinate metadata
zoom region: [x0, y0, x1, y1] High-res crop of last screenshot (for reading small text)
switch_display display Switch active monitor for screenshots. Use "auto" for main.
list_displays (none) Enumerate all connected displays

Other Tools

Tool Parameters Description
read_clipboard (none) Read clipboard text
write_clipboard text Write text to clipboard
get_active_window (none) Frontmost window app, title, position, size
list_windows (none) All visible windows
get_screen_info (none) Screen dimensions, Retina scale, accessibility status
get_cursor_position (none) Current cursor coordinates
open_application name or app Launch macOS app by name
wait duration Pause for N seconds (0-100)
run_shell_command command, timeout Execute shell command
request_access apps[], reason Register apps for session access control
list_granted_applications (none) List currently granted apps

Architecture

computer-use-mcp/
├── __main__.py                    # Entry point (python -m or direct)
├── __init__.py                    # Package metadata
├── pyproject.toml                 # Dependencies & build config
├── .mcp.json                     # Universal MCP client config
└── server/
    ├── __init__.py                # Re-exports all tool modules
    ├── computer_use_server.py     # MCP Server class, tool registry, stdio transport
    └── tools/
        ├── __init__.py            # Exports all tool getters/handlers
        ├── access_tools.py        # request_access, list_granted_applications
        ├── batch_tools.py         # computer_batch (action orchestrator)
        ├── clipboard_tools.py     # read/write clipboard (NSPasteboard)
        ├── display_tools.py       # switch_display, zoom, list_displays
        ├── keyboard_tools.py      # key, hold_key, type, press, hotkey (Quartz)
        ├── mouse_tools.py         # 12 mouse tools (Quartz CGEvent)
        ├── screen_tools.py        # screen info, cursor position (Quartz)
        ├── screenshot_tools.py    # screenshot capture (mss + PIL)
        ├── system_tools.py        # open app, wait, shell command
        └── window_tools.py        # active window, list windows (Quartz + AppKit)

How It Works

  1. Transport: stdio (JSON-RPC 2.0 over stdin/stdout)
  2. Tool Registry: ComputerUseMCPServer collects tools from 10 category modules, maps tool names to handlers
  3. Input Simulation: macOS Quartz CGEvent API for mouse/keyboard events posted to kCGHIDEventTap
  4. Screenshots: mss library for fast capture, PIL for resizing, base64 encoding
  5. Coordinate System: All tools use logical screen coordinates (Retina-aware). The server handles physical-to-logical scaling automatically.

Coordinate Mapping

Screenshots include metadata for mapping image pixels to screen coordinates:

click_x = (pixel_x / image_width) * logical_screen_width
click_y = (pixel_y / image_height) * logical_screen_height

On Retina displays, logical coordinates differ from physical pixels. The server handles this transparently.


Troubleshooting

"Accessibility permission not granted"

Go to System Settings > Privacy & Security > Accessibility and add your terminal/IDE app.

Server fails to start

Ensure you're using the venv Python (not system Python):

/path/to/computer-use-mcp/.venv/bin/python3 __main__.py

Mouse/keyboard tools return errors but screenshots work

Screenshot capture doesn't need accessibility permission, but input simulation does. Grant accessibility access to the process running the server.

"ModuleNotFoundError: No module named 'server'"

The __main__.py adds its directory to sys.path automatically. If running as a module (python -m computer_use), set the cwd to the parent directory of computer_use/.

Multi-monitor: wrong screen captured

Use list_displays to see all monitors, then switch_display to select the correct one. Use switch_display("auto") to reset.


Contributing

Contributions are welcome! This server is designed to be extensible:

  1. Add new tools by creating a file in server/tools/
  2. Define get_*_tools() and handle_*_tool() functions
  3. Register in server/computer_use_server.py tool_sources list
  4. Update server/tools/__init__.py exports

Please ensure new tools follow the existing patterns for error handling and JSON response format.


License

MIT License - see LICENSE for details.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured