MCP Servers

Screen Agent

Enables AI assistants to capture screenshots and control desktop input (mouse, keyboard) to see and interact with your screen. Features user-first safety controls including automatic pause on user activity and app allowlists to restrict interactions to approved applications only.

README

Screen Agent

Give AI agents eyes and hands on the desktop.

An MCP server that lets AI tools (Claude Code, Cursor, etc.) see your screen and interact with any application — with a multi-backend input system that works where others fail.

Why?

AI coding assistants are powerful but blind. Screen Agent fixes that with:

Multi-Backend Input Chain — three input methods (Accessibility API → CGEvent → pyautogui) tried in priority order with automatic fallback. Works with native apps, Electron apps, and game engines.
Input Guardian — real-time safety system that pauses all agent actions when you touch your mouse or keyboard. No other tool provides this.
Apple Vision OCR — zero-dependency text recognition on macOS (no 2GB PaddleOCR install needed).
Retina-Aware Coordinates — unified logical coordinate system that handles display scaling correctly.

Architecture

┌──────────────────────────────────┐
│          MCP Layer               │  19 tools via Model Context Protocol
├──────────────────────────────────┤
│          Engine Layer            │  InputChain (fallback) + Guardian (safety)
├──────────────────────────────────┤
│        Platform Layer            │  Protocol-based backends
│  AX → CGEvent → pyautogui       │  macOS / Linux
└──────────────────────────────────┘

Input Backend Chain

The core design challenge: pyautogui works for ~80% of apps but fails for game engines and many Electron apps. Screen Agent solves this with a Chain of Responsibility pattern:

Priority	Backend	Method	Best For
1	AX	`AXPerformAction`	Native macOS apps — semantic, no coordinates needed
2	CGEvent	`CGEventPost`	Games, Electron — native OS event injection
3	pyautogui	Python wrapper	Cross-platform fallback

Each backend implements the same InputBackend protocol. If one fails, the chain automatically tries the next. All attempts are logged with telemetry for observability.

Install

pip install screen-agent

# Recommended: install macOS native backends
pip install screen-agent[macos]

Quick Start

With Claude Code

claude mcp add screen -- screen-agent serve

With Cursor / other MCP clients

Add to your MCP config:

{
  "mcpServers": {
    "screen": {
      "command": "screen-agent",
      "args": ["serve"]
    }
  }
}

Check system capabilities

screen-agent check

Tools

Perception

Tool	Description
`capture_screen`	Screenshot (full or region), returns image for vision analysis
`list_windows`	List all visible windows with positions
`get_active_window`	Currently focused window
`get_cursor_position`	Current mouse position

Input (all support `verify: true` for post-action screenshots)

Tool	Description
`click`	Click at coordinates (left/right/middle, multi-click)
`type_text`	Type text at cursor (Unicode via clipboard on macOS)
`press_key`	Key press with modifiers (e.g., Cmd+C)
`scroll`	Scroll wheel at optional position
`move_mouse`	Move cursor without clicking
`drag`	Click-drag between two points
`focus_window`	Bring window to front by partial title match

OCR (requires macOS with Vision framework)

Tool	Description
`ocr`	Extract all text with bounding boxes
`find_text`	Find text and return location
`click_text`	Find text and click its center

Safety (Input Guardian)

Tool	Description
`add_app`	Add app to allowlist — agent can ONLY interact with listed apps
`remove_app`	Remove from allowlist
`set_region`	Restrict to pixel region
`clear_scope`	Remove all restrictions
`get_agent_status`	Guardian state, backend stats, scope info

Input Guardian

Screen Agent's unique safety system with two guarantees:

User Priority — any keyboard/mouse activity instantly pauses the agent. It resumes only after you've been idle for 1.5s (configurable).
Scope Lock — restrict the agent to specific apps and/or screen regions.

# Agent can only interact with Chrome and Figma
add_app("Chrome")
add_app("Figma")

# Or restrict to a region
set_region(x=0, y=0, width=800, height=600)

Configuration

All parameters are configurable via environment variables:

Variable	Default	Description
`SCREEN_AGENT_COOLDOWN`	1.5	Guardian cooldown seconds
`SCREEN_AGENT_GUARDIAN_DISABLED`	0	Set to "1" to disable
`SCREEN_AGENT_INPUT_BACKENDS`	ax,cgevent,pyautogui	Backend priority order
`SCREEN_AGENT_MAX_DIMENSION`	2000	Max screenshot dimension
`SCREEN_AGENT_LOG_LEVEL`	INFO	Logging level

Platform Support

Feature	macOS	Linux
Screenshot	✅ mss	✅ mss
AX Input	✅	—
CGEvent Input	✅	—
pyautogui Input	✅	✅
Window Management	✅ AppleScript	✅ wmctrl
OCR	✅ Vision Framework	—
Retina Scaling	✅	—

Development

git clone https://github.com/chriswu727/screen-agent
cd screen-agent
pip install -e ".[dev,macos]"
pytest tests/unit/ -v
ruff check src/ tests/

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured