macos-sys-assist
A secure, constraint-based macOS OS-level automation MCP server for AI assistants.
README
macos-sys-assist
A focused macOS automation MCP server for reliable input simulation, window management, and window-specific screenshots — the three things AppleScript and bash do poorly.
What Is This?
macos-sys-assist is a Python-based MCP server that fills the gaps where bash + Chrome DevTools fall short. It uses pyobjc (native macOS APIs) and Core Graphics for low-level input simulation — more reliable than AppleScript's keystroke.
What It Does That bash/CDP Can't
| Capability | Why Not bash/CDP |
|---|---|
| Core Graphics click/type/key | AppleScript keystroke misses keys or fails silently. This uses CGEventPost — the same API macOS uses internally. |
| Window-specific screenshots | bash screencapture captures the full screen; cropping is tedious. This captures just the window you want. |
| Precise window geometry | osascript returns position inconsistently. This uses the Accessibility API for accurate pixel-level data. |
| Multi-app window layouts | Arrange 3+ apps at specific positions in one command. bash needs multiple chained osascript calls. |
What It Does NOT Do (Use bash Instead)
| Tool | Why Use bash |
|---|---|
| Finding files | find / mdfind are simpler |
| Reading files | cat / python3 -c |
| Opening files | open command |
| App queries | osascript -e 'tell app "System Events"...' |
| Clipboard | pbpaste / pbcopy |
| Screen resolution | system_profiler SPDisplaysDataType |
Quick Start
Installation
git clone https://github.com/YOUR_USERNAME/macos-sys-assist.git
cd macos-sys-assist
./setup.sh
Grant Permissions
- Accessibility — System Settings → Privacy & Security → Accessibility → Add Terminal/Python
- Screen Recording (for screenshots) — System Settings → Privacy & Security → Screen Recording → Add Terminal/Python
Configure Apps
Edit allowed_apps.json to control which apps can be automated.
Usage
Standalone Mode
./run.sh
OpenCode Integration
Add to opencode.jsonc:
"mcp": {
"macos-sys-assist": {
"type": "local",
"command": ["/path/to/macos-sys-assist/run.sh"],
"enabled": true
}
}
Direct Python Usage (via bash)
.venv/bin/python3 -c "
import sys
sys.path.insert(0, '.')
from macos.input import InputSimulator
InputSimulator().click_at(100, 200, 'left')
"
Tool Reference
Input Simulation (Core Graphics)
| Tool | Description | Security |
|---|---|---|
click_at(x, y, button, double) |
Click at screen coordinates | ⚠️ Confirmation |
type_string(text) |
Type text character by character | ⚠️ Confirmation, max 500 chars |
press_key(combination) |
Press key combo (e.g., cmd+tab) |
⚠️ Blocked combos enforced |
More reliable than AppleScript — uses CGEventPost instead of keystroke.
Window Management
| Tool | Description | Security |
|---|---|---|
move_window(x, y) |
Move active window to coords | ⚠️ Confirmation |
resize_window(width, height) |
Resize active window | ⚠️ Confirmation |
get_window_geometry(pid) |
Get window position/size (Accurate) | Read-only |
Uses Accessibility API for pixel-level accuracy. More reliable than osascript.
Screenshots (Requires Screen Recording Permission)
| Tool | Description |
|---|---|
screenshot(filepath, display_id) |
Capture full screen |
screenshot_window(pid, filepath) |
Capture specific window only — no cropping needed |
screenshot_region(x, y, w, h, filepath) |
Capture a screen region |
get_displays() |
Get all connected displays and resolutions |
When to Use This vs bash
✅ Use macos-sys-assist when:
- AppleScript
keystrokeorclickfails silently - You need a screenshot of just one window without browser chrome
- You're arranging 3+ app windows at specific positions for a workspace
- The task requires pixel-level coordinate accuracy
❌ Use bash when:
- Finding files (
find,mdfind,ls) - Reading files (
cat,python3 -c) - Opening files (
open) - Basic clipboard (
pbpaste,pbcopy) - Checking what app is frontmost (
osascript) - Launching apps (
open -a)
🔄 Use Chrome DevTools when:
- Interacting with web pages (clicking buttons, filling forms)
- Uploading files to websites (base64 injection into
<input type="file">) - Reading page content
- Navigating multi-page web flows
Configuration
allowed_apps.json
Controls which apps can be automated:
{
"allowed_apps": [
{
"bundle_id": "com.brave.Browser",
"name": "Brave Browser",
"allow_actions": true
}
],
"global_settings": {
"require_confirmation_for_click": true,
"require_confirmation_for_type": true,
"max_string_length": 500,
"blocked_key_combinations": [
"cmd+q",
"cmd+delete",
"ctrl+alt+delete"
]
}
}
Project Structure
macos-sys-assist/
├── server.py # Main MCP server entry point
├── config.py # Configuration management
├── security.py # Security validation layer
├── allowed_apps.json # Application allow-list
├── requirements.txt # Python dependencies
├── setup.sh # Installation script
├── run.sh # Wrapper script
├── macos/ # Native macOS API wrappers
│ ├── accessibility.py # App queries, PID lookup
│ ├── window.py # Window move/resize/geometry
│ ├── input.py # Core Graphics click/type/key
│ ├── screenshot.py # Screen capture (full/window/region)
│ └── task_engine.py # Multi-step task execution
└── tools/ # MCP tool definitions
├── information.py # get_window_geometry
├── actions.py # click_at, type_string, press_key, move/resize
└── screenshot.py # screenshot, screenshot_window, screenshot_region, get_displays
Roadmap
Completed ✅
- [x] Core Graphics input simulation (click, type, key)
- [x] Window management (move, resize, geometry)
- [x] Window-specific screenshots (no cropping)
- [x] Security layer (allow-list, blocked keys, confirmations)
Planned 📋
- [ ] Folder Watcher — Detect new files in Downloads, auto-organize by project
- [ ] System State — Battery, WiFi, disk space checks before long automations
- [ ] Window Layout Presets — Save/restore multi-app workspaces
- [ ] Calendar Integration — Meeting-aware automation scheduling
Security Model
Design Principles
- No Shell Access — All operations use native macOS APIs
- Explicit Allow-List — Only pre-approved apps can be controlled
- Human-in-the-Loop — Invasive actions require user confirmation
- Input Validation — Text length limits, key combo blocking
What's Blocked
| Threat | Mitigation |
|---|---|
| Unauthorized app control | Application allow-list |
| Destructive key combos | Blocked combinations list |
| Excessive text input | Maximum string length (500) |
| Unconfirmed actions | Confirmation prompts |
Troubleshooting
"Accessibility permission not granted"
- System Settings → Privacy & Security → Accessibility
- Add Terminal.app or
.venv/bin/python3 - Ensure toggle is ON
- Restart the server
"Screen Recording permission required"
- System Settings → Privacy & Security → Screen Recording
- Add Terminal.app or
.venv/bin/python3 - Ensure toggle is ON
- Restart the server
"App not in allow-list"
- Find the app's bundle ID:
osascript -e 'id of app "AppName"' - Add it to
allowed_apps.json - Restart the server
License
MIT License — see LICENSE
Acknowledgments
Built for the OpenCode AI assistant framework.
Uses the Model Context Protocol for tool integration.
Powered by pyobjc for native macOS API access.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.