macbot-mcp

macbot-mcp

Enables AI agents to control macOS applications through the Accessibility API, AppleScript, and CGEvents, providing structured text output of UI elements and actions without needing screenshots.

Category
Visit Server

README

macbot-mcp

An MCP server that gives AI agents hands on macOS — via the Accessibility API, AppleScript, and Quartz CGEvents.

The macOS counterpart to ahk-mcp. Same thesis: the accessibility tree already contains a machine-readable description of everything on screen. Screenshots throw that away and make the model re-derive it from pixels. Why?

Token cost per action: ~200-700 tokens (vs ~2000-3500 for screenshot-based).

Because the output is structured text, not images, any language model can drive it — including small open-source models with no vision capability. A 7B model can parse AXButton title="Save" @450,320 88x32 and call mac_click(x=490, y=336). It cannot interpret a screenshot. The accessibility approach makes computer use available to models that were previously locked out of it entirely.

How it works

macbot-mcp exposes 14 tools over MCP's stdio transport:

  • Observation tools read the macOS accessibility tree, window properties, and browser URLs — returning structured text with element roles, names, values, and screen coordinates
  • Action tools click, drag, scroll, type, and send keystrokes via Quartz CGEvents and AppleScript
  • mac_run_applescript is the escape hatch — execute arbitrary AppleScript for anything the built-in tools don't cover

Every observation tool returns coordinates. Find a button with mac_ui_find, get its x,y position, and click it with mac_click — no screenshot needed.

Token cost comparison

Approach Tokens per action What you get
Screenshot-based (full screen PNG) ~2000-3500 Pixels. Model must OCR, locate elements, interpret layout.
macbot-mcp (structured text) ~200-700 Element roles, titles, values, and pixel coordinates.

A 20-step workflow: ~50k tokens with screenshots, ~8k with macbot. The structured output is also more reliable — the model doesn't guess where the "Save" button is when the accessibility tree says AXButton title="Save" @1043,672 88x32.

Installation

Prerequisites

  • macOS 12+ (Monterey or later)
  • Python 3.10+

Setup

git clone https://github.com/anomalous3/macbot-mcp.git
cd macbot-mcp

# Create a virtual environment
python3 -m venv .venv
# or: uv venv .venv

# Activate
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

macOS Permissions

macbot needs two permissions, granted to your terminal app (Terminal, kitty, iTerm2, etc.):

  1. Accessibility — System Settings > Privacy & Security > Accessibility
  2. Screen Recording — System Settings > Privacy & Security > Screen Recording

Without Accessibility, observation and input tools won't work. Without Screen Recording, mac_screenshot will fail.

Claude Code MCP configuration

Add to your ~/.claude.json:

{
  "mcpServers": {
    "macbot": {
      "command": "/path/to/macbot-mcp/.venv/bin/python3",
      "args": ["/path/to/macbot-mcp/server.py"],
      "env": {
        "MACBOT_SCREENSHOT_DIR": "/tmp/macbot"
      }
    }
  }
}

After adding the config, restart Claude Code. The tools appear with the mcp__macbot__ prefix.

Tool reference

Observation

Tool Description
mac_ui_tree Dump the accessibility tree of any app. Returns roles, titles, values, descriptions, and screen coordinates. Configurable depth and node limit. This is the primary observation tool — use it before reaching for screenshots.
mac_ui_find Search for UI elements by name and/or role. Returns matches with coordinates. Find that "Submit" button without scanning the whole tree.
mac_ui_url Get the current URL from the active browser's address bar (Firefox, Chrome, Safari).
mac_get_windows List all visible windows with app name, title, position, and size.
mac_screenshot Capture full screen, a specific window, or a region. Returns a PNG file path. The fallback when you genuinely need pixels.

Action

Tool Description
mac_click Click at screen coordinates. Uses Quartz CGEvents (falls back to cliclick if installed).
mac_drag Click and drag between two points. Configurable duration for smooth drags. Works for rotating 3D plots, selecting text, moving windows.
mac_scroll Scroll at a screen position. Vertical and horizontal.
mac_type_text Type text into the frontmost app. Uses clipboard paste by default (fast, Unicode-safe). Optional keystroke mode for modifier-sensitive fields.
mac_key_press Send a key press with modifiers. Supports named keys (return, tab, escape, arrows, F-keys) and characters with command/option/control/shift.
mac_focus_app Bring an application to the front.
mac_get_clipboard Read the system clipboard.
mac_set_clipboard Set the system clipboard.

Escape hatch

Tool Description
mac_run_applescript Execute arbitrary AppleScript. Full access to System Events, app scripting dictionaries, and everything else AppleScript can do.

Browser automation via the Accessibility API

Modern browsers expose their full UI through the macOS Accessibility API. macbot reads this directly — no browser extension, no WebDriver, no Playwright needed.

Read the tab bar with element coordinates:

> mac_ui_tree app="Firefox" max_depth=5 max_nodes=50

AXApplication title="Firefox"
  AXWindow title="GitHub - anomalous3/macbot-mcp" @36,30 1752x957
    AXGroup desc="GitHub - anomalous3/macbot-mcp" @36,30 1752x957
      AXToolbar desc="Browser tabs" @36,30 1752x44
        AXTabGroup (tab group) @190,30 1518x44
          AXRadioButton title="GitHub - anomalous3/macbot-mcp" value="True" @193,30 210x44
          AXRadioButton title="New Tab" @403,30 210x44

Find a specific element and get its click coordinates:

> mac_ui_find app="Firefox" name="Submit" role="AXButton"

[{"role": "AXButton", "title": "Submit", "x": 450, "y": 320, "width": 80, "height": 32}]

Get the URL without screenshots or clipboard tricks:

> mac_ui_url app="Firefox"

{"url": "https://github.com/anomalous3/macbot-mcp", "field": "Search with Google or enter address"}

Use Firefox. It exposes the richest accessibility tree of the major browsers — more element detail, better labeling, and more consistent structure than Chrome or Safari.

The coordinate system

All coordinates are absolute screen pixels in macOS logical coordinates (not Retina physical pixels). Origin (0,0) is the top-left of the primary display.

The @x,y WxH format in mac_ui_tree output gives position and size directly. To click the center of @450,320 80x32, click at (490, 336).

Works with everything

macbot works with any macOS application that implements the Accessibility API (which is most of them):

  • Browsers — Firefox, Chrome, Safari (Firefox recommended for richest tree)
  • Terminals — kitty, Terminal.app, iTerm2 (can read content, type commands)
  • Editors — VS Code, Sublime Text, TextEdit
  • System apps — Finder, System Settings, Activity Monitor
  • Any app — if it has windows and controls, macbot can probably read and drive it

Configuration

Variable Default Description
MACBOT_SCREENSHOT_DIR /tmp/macbot Directory for screenshot PNGs

Platform

macOS only. For Windows, see ahk-mcp. The approach is the same — read the accessibility tree, act via synthetic input — just different platform APIs.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured