mac-vision-mcp
An MCP server that enables AI coding agents to capture screenshots of macOS windows and displays on demand, providing tools for window discovery and screen capture.
README
mac-vision-mcp
A Model Context Protocol (MCP) server that enables AI coding agents to capture screenshots of macOS windows and displays on demand.
Why
LLMs are amazing at using images for context. You can feed image files to an LLM and it can do things like analyze a design or read text. I find myself constantly wanting to "show" LLMs what I'm looking at, but I found it cumbersome to take a screenshot, find the file, and give the path to the LLM. Additionally I ended up with thousands of screenshots over time that I needed to manage. So I thought, why can't the LLM just do this itself? And that's what led to this project.
Features
- Window Discovery - List all open windows with metadata (title, app, bounds, display)
- Window Capture - Capture screenshots of specific windows by ID
- Display Capture - Capture entire displays (single or all)
- Smart Filtering - Automatically filters out system overlays and utility windows
- Natural Integration - Works seamlessly with any MCP-compatible AI agent
- Privacy First - Runs entirely locally on your Mac
- Professional Logging - Structured logging with timestamps for debugging
System Requirements
- macOS: 12.0+ (Monterey or later)
- Architecture: Intel (x64) or Apple Silicon (arm64)
- Node.js: 16.0.0 or higher
- Permissions: Screen Recording permission required
Installation
Global Installation (Recommended)
npm install -g mac-vision-mcp
Using with npx (No Installation)
npx -y mac-vision-mcp
Quick Start
1. Grant Screen Recording Permission
On first run, macOS will prompt you to grant Screen Recording permission:
- Open System Preferences
- Go to Privacy & Security > Screen Recording
- Enable permission for the application running the MCP server
- Restart the MCP server
2. Configure Your MCP Client
For Claude Code
Add to .mcp.json in your project:
{
"mcpServers": {
"mac-vision": {
"command": "npx",
"args": ["-y", "mac-vision-mcp"]
}
}
}
For Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"mac-vision": {
"command": "npx",
"args": ["-y", "mac-vision-mcp"]
}
}
}
For Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"mac-vision": {
"command": "npx",
"args": ["-y", "mac-vision-mcp"]
}
}
}
3. Use with Your AI Agent
Once configured, your AI agent can use natural language to capture screenshots:
User: "Show me my Chrome window with the error"
Agent: [calls list_windows]
Agent: [calls capture_window with the Chrome window ID]
Agent: "I can see the 404 error in your browser..."
MCP Tools
list_windows
Get all open windows with metadata.
Parameters: None
Returns:
{
"windows": [
{
"id": "12345",
"title": "Chrome - Documentation",
"app": "Google Chrome",
"bounds": {
"x": 0,
"y": 23,
"width": 1920,
"height": 1057
},
"display": 0
}
]
}
capture_window
Capture a screenshot of a specific window.
Parameters:
window_id(required, string) - Window ID fromlist_windowsmode(optional, string) - Capture mode:"full"or"content"(default:"full")output_path(optional, string) - Custom output path (must end with.png)
Returns:
{
"success": true,
"file_path": "/tmp/screenshot_12345.png",
"window": {
"id": "12345",
"title": "Chrome - Documentation",
"app": "Google Chrome"
}
}
capture_windows
Capture screenshots of multiple windows at once. Useful when you need to see several windows simultaneously.
Parameters:
window_ids(required, string[]) - Array of Window IDs fromlist_windowsmode(optional, string) - Capture mode:"full"or"content"(default:"full")output_dir(optional, string) - Custom output directory (default: temp directory)
Returns:
{
"success": true,
"captures": [
{
"window_id": "12345",
"success": true,
"file_path": "/tmp/screenshot_12345.png",
"window": {
"id": "12345",
"title": "Chrome - Documentation",
"app": "Google Chrome"
}
},
{
"window_id": "67890",
"success": true,
"file_path": "/tmp/screenshot_67890.png",
"window": {
"id": "67890",
"title": "VS Code",
"app": "Code"
}
}
]
}
capture_display
Capture entire display(s).
Parameters:
display_id(optional, number) - Specific display number (0-indexed), or omit to capture all
Single Display Returns:
{
"success": true,
"file_path": "/tmp/display_0.png",
"display": 0
}
All Displays Returns:
{
"success": true,
"captures": [
{
"display": 0,
"file_path": "/tmp/display_0.png"
},
{
"display": 1,
"file_path": "/tmp/display_1.png"
}
]
}
Troubleshooting
Permission Denied Errors
Error: Screen Recording permission required
Solution:
- Open System Preferences > Privacy & Security > Screen Recording
- Enable permission for your terminal or application
- Restart the MCP server
Window Not Found
Error: Window {id} not found. It may have been closed.
Cause: The window was closed between listing and capturing.
Solution: Call list_windows again to get current window IDs.
Invalid Output Path
Error: Output path must end with .png
Solution: Ensure custom output paths have a .png extension.
Native Module Issues
Error: Native module compilation errors
Solution:
- Ensure you're on macOS 12.0+
- Verify Node.js version is 16.0.0+
- Try reinstalling:
npm install -g mac-vision-mcp --force
No Windows Listed
Issue: list_windows returns empty array or missing windows
Cause: Screen Recording permission not granted or windows filtered out
Solution:
- Verify Screen Recording permission is enabled
- Note: System windows and gesture overlays are automatically filtered
- Windows smaller than 50x50 pixels are excluded
Architecture
- Language: TypeScript/Node.js with ESM modules
- MCP SDK: @modelcontextprotocol/sdk (v1.22.0)
- Screenshot Library: node-screenshots (v0.2.4) with native N-API bindings
- Window Metadata: get-windows (v9.2.3)
- Permissions: mac-screen-capture-permissions (v2.1.0)
- Validation: Zod (v3.25.0)
Development
Local Setup
# Clone repository
git clone https://github.com/jasich/mac-vision-mcp.git
cd mac-vision-mcp
# Install dependencies
npm install
# Build
npm run build
# Run locally
node dist/index.js
Using Local Build in Another Project
To test your local development build with Claude Code or another MCP client:
-
Build the project (if not already done):
cd /path/to/mac-vision-mcp npm run build -
Configure your other project's
.claude.jsonwith the absolute path:{ "mcpServers": { "mac-vision": { "command": "node", "args": ["/path/to/mac-vision-mcp/dist/index.js"] } } } -
Restart Claude Code to load the local build
-
Make changes and rebuild as needed:
npm run build # Rebuild after code changes
Note: Replace /path/to/mac-vision-mcp with your actual absolute path to the project.
Testing with MCP Inspector
# Run with MCP Inspector for debugging
npx @modelcontextprotocol/inspector node ./dist/index.js
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - see LICENSE file for details.
Acknowledgments
- Built on the Model Context Protocol
- Uses node-screenshots for native screenshot capture
- Uses get-windows by Sindre Sorhus for window metadata
Support
- Issues: Report bugs or request features via GitHub Issues
- Documentation: Model Context Protocol Docs
- MCP Inspector: Use for testing and debugging MCP tools
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.