MCP Servers

Computer Use MCP Server

A production-grade macOS MCP server exposing 33 tools for full desktop automation, including mouse, keyboard, screenshot, clipboard, and window control.

README

Computer Use MCP Server

A production-grade macOS Computer Use MCP Server that exposes 33 tools across 10 categories for full desktop automation via the Model Context Protocol. Control mouse, keyboard, screenshots, clipboard, windows, and more from any MCP-compatible AI client.

Works with Claude Code, Cursor, VS Code, Windsurf, LM Studio, Ollama, llama.cpp, MLX, and any MCP-compatible tool.

Features

33 Tools Across 10 Categories

Category	Tools	Description
Mouse (12)	`mouse_click`, `left_click`, `right_click`, `middle_click`, `double_click`, `triple_click`, `left_mouse_down`, `left_mouse_up`, `mouse_move`, `mouse_drag`, `scroll`, `mouse_scroll`	Full mouse control with coordinate-based clicking, dragging with 20-step interpolation, directional scrolling
Keyboard (5)	`key`, `hold_key`, `keyboard_type`, `keyboard_press`, `keyboard_hotkey`	Unified key combos (`cmd+c`), hold-for-duration, Unicode text typing, individual key press, modifier hotkeys
Screenshot (1)	`take_screenshot`	Full-screen or region capture with Retina scaling, coordinate metadata, and configurable resolution
Display (3)	`switch_display`, `zoom`, `list_displays`	Multi-monitor switching, high-res region zoom for reading small text, display enumeration
Clipboard (2)	`read_clipboard`, `write_clipboard`	Read/write system clipboard via NSPasteboard
Window (2)	`get_active_window`, `list_windows`	Frontmost window info, enumerate all visible windows with position/size
Screen (2)	`get_screen_info`, `get_cursor_position`	Display dimensions, Retina scale, accessibility status, cursor coordinates
System (3)	`open_application`, `wait`, `run_shell_command`	Launch apps by name, timed waits, shell command execution
Access (2)	`request_access`, `list_granted_applications`	App permission tracking for session-based access control
Batch (1)	`computer_batch`	Execute multiple actions in a single call - eliminates round-trip latency

Quick Start

# 1. Clone
git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp

# 2. Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt  # or: pip install mcp mss pillow pyobjc-framework-Quartz

# 3. Test
python3 __main__.py

The server communicates over stdio (stdin/stdout) using the MCP JSON-RPC protocol.

Installation

Prerequisites

macOS (uses Quartz framework for input simulation)
Python 3.10+
Accessibility permissions (System Settings > Privacy & Security > Accessibility)

Install Dependencies

git clone https://github.com/syedazharmbnr1/computer-use-mcp.git
cd computer-use-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install mcp>=1.26.0 mss pillow pyobjc-framework-Quartz

Verify Installation

python3 -c "
from server.computer_use_server import ComputerUseMCPServer
server = ComputerUseMCPServer()
tools = server._collect_all_tools()
print(f'Server OK - {len(tools)} tools registered')
"

Expected output: Server OK - 33 tools registered

Grant Accessibility Permission

The server needs macOS accessibility access to simulate mouse/keyboard input:

Open System Settings > Privacy & Security > Accessibility
Add your terminal app (Terminal, iTerm2, VS Code, etc.)
Toggle the permission ON

Screenshot capture works without accessibility permission. Only mouse/keyboard tools require it.

Configuration for AI Coding Tools

The server uses stdio transport - it reads from stdin and writes to stdout. Every MCP client connects the same way: spawn the Python process and pipe stdio.

Claude Code

Edit ~/.claude/settings.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"],
      "cwd": "/path/to/computer-use-mcp"
    }
  }
}

Then run /mcp in Claude Code to connect.

Cursor

Create .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

VS Code + GitHub Copilot

Create .vscode/mcp.json in your workspace:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

JetBrains IDEs

Add via Settings > Tools > MCP Servers, using the same command/args pattern.

Zed

Add to your Zed settings (~/.config/zed/settings.json):

{
  "language_models": {
    "mcp_servers": {
      "computer-use": {
        "command": "/path/to/computer-use-mcp/.venv/bin/python3",
        "args": ["/path/to/computer-use-mcp/__main__.py"]
      }
    }
  }
}

Cline / Continue.dev

Both support the standard MCP JSON config format. Add to their respective config files using the same command + args pattern shown above.

Recommended Models for Tool Calling (April 2026)

Per LM Arena rankings and real-world testing, these are the best models for MCP tool calling:

Top Open Source Models (LM Arena Elo)

Rank	Model	Provider	Parameters	Highlights
1	GLM-5	Zhipu AI	MoE	#1 open source (Elo 1451), 77.8% SWE-bench Verified
2	Kimi K2.5	Moonshot AI	MoE	HumanEval 99.0, stable across 200-300 sequential tool calls
3	GLM-4.7	Zhipu AI	MoE	HumanEval 94.2, AIME 2025 95.7, GPQA 85.7
4	GLM-5.1	Zhipu AI	744B MoE / 40B active	MIT license, 200K context, 8+ hour continuous agentic sessions
5	Qwen 3.6 Plus	Alibaba	Dense	1M context, native function calling, always-on CoT reasoning
6	Gemma 4 31B	Google	31B Dense	#3 Arena text, Apache 2.0, native tool calling, 256K context
7	Llama 4 Scout	Meta	17B active / 16 experts	10M context window, multimodal, beats Gemini 2.0 Flash-Lite
8	Llama 4 Maverick	Meta	17B active / 128 experts	Beats GPT-4o, best multimodal in class
9	Mistral Small 4	Mistral AI	119B MoE / 6B active	Unified instruct+reasoning+coding+vision, 256K context
10	Qwen 3.5	Alibaba	Multiple sizes	Most stable tool calling, rarely hallucinates calls

Best Models by Platform

Ollama (run locally via ollama pull <model>):

gemma4 (E2B / E4B / 26B MoE / 31B Dense) — native function calling, best sub-32B for agents
qwen3.5 / qwen3.6-plus — most stable tool calling, rarely drops parameters
llama4 (Scout / Maverick) — native multimodal + tools, 10M context
kimi-k2.5 — 200+ sequential tool calls without drift
glm-5.1 — long-horizon agentic coding (8+ hours continuous)
mistral-small4 — unified model, 6B active, fast
granite4 — enterprise-grade tool calling
phi-4-mini — compact with function calling support
deepseek-r1 — strong reasoning + tool use

llama.cpp (GGUF format):

bartowski/Gemma-4-31B-IT-GGUF — best open weight for agents
bartowski/Qwen3.5-32B-Instruct-GGUF — stable tool calling
bartowski/Llama-4-Scout-17B-GGUF — 10M context, multimodal
bartowski/GLM-5.1-40B-GGUF — top open source coding
Any model with Jinja chat template + function calling support

MLX (Apple Silicon via mlx-community):

mlx-community/Gemma-4-31B-IT-4bit — best performance/quality on Apple Silicon
mlx-community/Qwen3.5-32B-Instruct-4bit — stable tool calls
mlx-community/Llama-4-Scout-17B-4bit — multimodal + tools
mlx-community/Mistral-Small-4-6B-4bit — fast, 6B active

LM Studio: All of the above models are available through LM Studio's model browser with native MCP host support.

Configuration for Local Model Frameworks

LM Studio

LM Studio has native MCP host support since v0.3.17.

Open LM Studio > Settings > MCP
Add a new MCP server with:
- Command: /path/to/computer-use-mcp/.venv/bin/python3
- Args: ["/path/to/computer-use-mcp/__main__.py"]
Select a model with tool calling support:
- Top picks: Gemma 4 31B, Qwen 3.5/3.6, Llama 4 Scout, GLM-5.1, Mistral Small 4, Kimi K2.5
The tools will appear in the chat interface

llama.cpp (Native MCP - March 2026+)

llama.cpp merged native MCP client support in March 2026 (PR #18655), adding a full agentic loop with MCP server management in the WebUI.

Start llama-server with MCP:

# Start with a top function-calling model (pick one)
llama-server --jinja -fa -hf bartowski/Gemma-4-31B-IT-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Qwen3.5-32B-Instruct-GGUF:Q4_K_M --port 8080
llama-server --jinja -fa -hf bartowski/Llama-4-Scout-17B-GGUF:Q4_K_M --port 8080

Then in the llama.cpp WebUI:

Go to MCP Server Settings
Add this server with command: /path/to/.venv/bin/python3 /path/to/__main__.py
The 33 tools will be available in the agentic loop

Via llama-mcp-server bridge:

npm install -g llama-mcp-server

Configure in claude_desktop_config.json:

{
  "mcpServers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Supported models for tool calling: Gemma 4, Qwen 3.5/3.6, Llama 4 Scout/Maverick, GLM-5.1, Kimi K2.5, Mistral Small 4, Llama 3.3, DeepSeek R1, Granite 4, Phi-4-mini, Hermes 3, Functionary v3.

Ollama

Ollama does not have native MCP support yet, but several bridge solutions work:

Option A: MCP-Bridge (recommended)

MCP-Bridge acts as middleware between Ollama's OpenAI-compatible API and MCP servers.

git clone https://github.com/SecretiveShell/MCP-Bridge.git
cd MCP-Bridge

Configure config.json:

{
  "inference_server": {
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama"
  },
  "mcp_servers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Option B: ollama-mcp-bridge

git clone https://github.com/patruff/ollama-mcp-bridge.git
cd ollama-mcp-bridge
npm install && npm run build

Add the computer-use server to the bridge config.

Recommended Ollama models (April 2026):

gemma4:31b — best sub-32B for agents, native function calling
qwen3.5:32b — most stable tool calling
llama4:scout — 10M context, multimodal + tools
kimi-k2.5 — 200+ sequential tool calls without drift
glm-5.1 — long-horizon agentic (8+ hours continuous)
mistral-small4 — fast, 6B active params
granite4 — enterprise tool calling

MLX / Apple Silicon

For Apple Silicon Macs, use vLLM-MLX for optimized local inference with MCP bridge:

Install vLLM-MLX:

pip install git+https://github.com/waybarrios/vllm-mlx.git

Start the inference server:

# Pick a model (top recommendations for tool calling)
vllm-mlx serve mlx-community/Gemma-4-31B-IT-4bit --port 8000
vllm-mlx serve mlx-community/Qwen3.5-32B-Instruct-4bit --port 8000
vllm-mlx serve mlx-community/Llama-4-Scout-17B-4bit --port 8000

Connect via MCP-Bridge:

{
  "inference_server": {
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed"
  },
  "mcp_servers": {
    "computer-use": {
      "command": "/path/to/computer-use-mcp/.venv/bin/python3",
      "args": ["/path/to/computer-use-mcp/__main__.py"]
    }
  }
}

Performance: M4 Max achieves ~402 tokens/sec on small models, ~1112 tokens/sec with continuous batching.

Alternative: oMLX provides a macOS menu bar app with MCP tool integration.

vLLM

vLLM has native MCP integration with GPU-optimized inference.

pip install vllm
vllm serve google/gemma-4-31b-it --port 8000       # or any tool-calling model
vllm serve Qwen/Qwen3.5-32B-Instruct --port 8000   # stable tool calling
vllm serve meta-llama/Llama-4-Scout-17B --port 8000 # multimodal + tools

Connect via MCP-Bridge using http://localhost:8000/v1 as the base URL.

Generic OpenAI-Compatible API

Any service exposing an OpenAI-compatible API (local or remote) can use this server through MCP-Bridge:

Start your inference server (Ollama, llama.cpp, vLLM, MLX, TGI, etc.)
Point MCP-Bridge at it with the base_url
Add this server to MCP-Bridge's mcp_servers config
MCP-Bridge intercepts API requests, enriches them with tool definitions, executes tool calls, and returns results

Tool Reference

Batch Operations — `computer_batch`

Execute multiple actions in a single call to eliminate round-trip latency:

{
  "actions": [
    {"action": "left_click", "coordinate": [100, 200]},
    {"action": "type", "text": "Hello, world!"},
    {"action": "key", "text": "Return"},
    {"action": "wait", "duration": 1},
    {"action": "screenshot"}
  ]
}

Supported actions: key, type, mouse_move, left_click, left_click_drag, right_click, middle_click, double_click, triple_click, scroll, hold_key, screenshot, cursor_position, left_mouse_down, left_mouse_up, wait

Mouse Tools

Tool	Parameters	Description
`left_click`	`coordinate: [x, y]`	Left-click at coordinates
`right_click`	`coordinate: [x, y]`	Right-click (context menu)
`middle_click`	`coordinate: [x, y]`	Middle-click (scroll wheel)
`double_click`	`coordinate: [x, y]`	Double-click (select word)
`triple_click`	`coordinate: [x, y]`	Triple-click (select line)
`mouse_click`	`x, y, button, click_count`	General click with full control
`mouse_move`	`x, y` or `coordinate: [x, y]`	Move cursor without clicking
`mouse_drag`	`start_coordinate, coordinate`	Drag with 20-step interpolation
`left_mouse_down`	(none)	Press and hold left button
`left_mouse_up`	(none)	Release left button
`scroll`	`coordinate, scroll_direction, scroll_amount`	Directional scroll (up/down/left/right)
`mouse_scroll`	`amount, x, y`	Scroll wheel (positive=up, negative=down)

Keyboard Tools

Tool	Parameters	Description
`key`	`text: "cmd+c"`, `repeat`	Unified key press with modifiers joined by `+`
`hold_key`	`text: "shift"`, `duration`	Hold key for N seconds then release
`keyboard_type`	`text`	Type text character by character (Unicode)
`keyboard_press`	`key`	Press a single named key
`keyboard_hotkey`	`keys: ["cmd", "c"]`	Press key combination as array

Supported keys: return, tab, space, delete, escape, arrows (left, right, up, down), home, end, pageup, pagedown, f1-f12, a-z, 0-9, symbols.

Modifiers: cmd/command, shift, alt/option, ctrl/control, fn

Screenshot & Display Tools

Tool	Parameters	Description
`take_screenshot`	`region` (optional), `max_dimension`	Capture screen as base64 PNG with coordinate metadata
`zoom`	`region: [x0, y0, x1, y1]`	High-res crop of last screenshot (for reading small text)
`switch_display`	`display`	Switch active monitor for screenshots. Use `"auto"` for main.
`list_displays`	(none)	Enumerate all connected displays

Other Tools

Tool	Parameters	Description
`read_clipboard`	(none)	Read clipboard text
`write_clipboard`	`text`	Write text to clipboard
`get_active_window`	(none)	Frontmost window app, title, position, size
`list_windows`	(none)	All visible windows
`get_screen_info`	(none)	Screen dimensions, Retina scale, accessibility status
`get_cursor_position`	(none)	Current cursor coordinates
`open_application`	`name` or `app`	Launch macOS app by name
`wait`	`duration`	Pause for N seconds (0-100)
`run_shell_command`	`command`, `timeout`	Execute shell command
`request_access`	`apps[], reason`	Register apps for session access control
`list_granted_applications`	(none)	List currently granted apps

Architecture

computer-use-mcp/
├── __main__.py                    # Entry point (python -m or direct)
├── __init__.py                    # Package metadata
├── pyproject.toml                 # Dependencies & build config
├── .mcp.json                     # Universal MCP client config
└── server/
    ├── __init__.py                # Re-exports all tool modules
    ├── computer_use_server.py     # MCP Server class, tool registry, stdio transport
    └── tools/
        ├── __init__.py            # Exports all tool getters/handlers
        ├── access_tools.py        # request_access, list_granted_applications
        ├── batch_tools.py         # computer_batch (action orchestrator)
        ├── clipboard_tools.py     # read/write clipboard (NSPasteboard)
        ├── display_tools.py       # switch_display, zoom, list_displays
        ├── keyboard_tools.py      # key, hold_key, type, press, hotkey (Quartz)
        ├── mouse_tools.py         # 12 mouse tools (Quartz CGEvent)
        ├── screen_tools.py        # screen info, cursor position (Quartz)
        ├── screenshot_tools.py    # screenshot capture (mss + PIL)
        ├── system_tools.py        # open app, wait, shell command
        └── window_tools.py        # active window, list windows (Quartz + AppKit)

How It Works

Transport: stdio (JSON-RPC 2.0 over stdin/stdout)
Tool Registry: ComputerUseMCPServer collects tools from 10 category modules, maps tool names to handlers
Input Simulation: macOS Quartz CGEvent API for mouse/keyboard events posted to kCGHIDEventTap
Screenshots: mss library for fast capture, PIL for resizing, base64 encoding
Coordinate System: All tools use logical screen coordinates (Retina-aware). The server handles physical-to-logical scaling automatically.

Coordinate Mapping

Screenshots include metadata for mapping image pixels to screen coordinates:

click_x = (pixel_x / image_width) * logical_screen_width
click_y = (pixel_y / image_height) * logical_screen_height

On Retina displays, logical coordinates differ from physical pixels. The server handles this transparently.

Troubleshooting

"Accessibility permission not granted"

Go to System Settings > Privacy & Security > Accessibility and add your terminal/IDE app.

Server fails to start

Ensure you're using the venv Python (not system Python):

/path/to/computer-use-mcp/.venv/bin/python3 __main__.py

Mouse/keyboard tools return errors but screenshots work

Screenshot capture doesn't need accessibility permission, but input simulation does. Grant accessibility access to the process running the server.

"ModuleNotFoundError: No module named 'server'"

The __main__.py adds its directory to sys.path automatically. If running as a module (python -m computer_use), set the cwd to the parent directory of computer_use/.

Multi-monitor: wrong screen captured

Use list_displays to see all monitors, then switch_display to select the correct one. Use switch_display("auto") to reset.

Contributing

Contributions are welcome! This server is designed to be extensible:

Add new tools by creating a file in server/tools/
Define get_*_tools() and handle_*_tool() functions
Register in server/computer_use_server.py tool_sources list
Update server/tools/__init__.py exports

Please ensure new tools follow the existing patterns for error handling and JSON response format.

License

MIT License - see LICENSE for details.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Computer Use MCP Server

README

Computer Use MCP Server

Features

33 Tools Across 10 Categories

Quick Start

Installation

Prerequisites

Install Dependencies

Verify Installation

Grant Accessibility Permission

Configuration for AI Coding Tools

Claude Code

Cursor

VS Code + GitHub Copilot

Windsurf

JetBrains IDEs

Zed

Cline / Continue.dev

Recommended Models for Tool Calling (April 2026)

Top Open Source Models (LM Arena Elo)

Best Models by Platform

Configuration for Local Model Frameworks

LM Studio

llama.cpp (Native MCP - March 2026+)

Ollama

MLX / Apple Silicon

vLLM

Generic OpenAI-Compatible API

Tool Reference

Batch Operations — computer_batch

Mouse Tools

Keyboard Tools

Screenshot & Display Tools

Other Tools

Architecture

How It Works

Coordinate Mapping

Troubleshooting

"Accessibility permission not granted"

Server fails to start

Mouse/keyboard tools return errors but screenshots work

"ModuleNotFoundError: No module named 'server'"

Multi-monitor: wrong screen captured

Contributing

License

Recommended Servers

Batch Operations — `computer_batch`