Screen Vision MCP Server
Enables screen capture, OCR text extraction, and automated clicking on macOS through MCP. Supports fullscreen, window, and region capture with optional text recognition and monitoring.
README
Screen Vision MCP Server
A Model Context Protocol (MCP) server that provides comprehensive screen capture, OCR, and visual understanding capabilities for macOS.
Features
- capture_fullscreen: Capture the entire screen
- capture_window: Capture specific application windows
- capture_region: Capture defined screen regions
- extract_text_from_screen: OCR text extraction from screenshots
- find_text_on_screen: Locate text on screen and return coordinates
- get_window_list: List all open windows with details
- get_screen_info: Get display and screen information
- click_at_position: Automated clicking at specific coordinates
- monitor_screen_region: Monitor regions for changes over time
- Screenshot resource management and retrieval
Installation
Quick Install
npm install -g screen-vision-mcp
From Source
-
Clone the repository:
git clone https://github.com/TIMBOTGPT/screen-vision-mcp.git cd screen-vision-mcp -
Install dependencies:
npm install -
Test the server:
npm start
Usage with Claude Desktop
Add this server to your Claude Desktop MCP configuration (claude_desktop_config.json):
{
"mcpServers": {
"screen-vision": {
"command": "npx",
"args": ["-y", "screen-vision-mcp"],
"description": "Screen capture and vision analysis"
}
}
}
Or if installed locally:
{
"mcpServers": {
"screen-vision": {
"command": "node",
"args": ["/path/to/screen-vision-mcp/index.js"],
"description": "Screen capture and vision analysis"
}
}
}
Available Tools
capture_fullscreen
Capture the entire screen.
Parameters:
save_path(optional): Custom save path for the screenshot
Example:
{
"name": "capture_fullscreen",
"arguments": {
"save_path": "/path/to/save/screenshot.png"
}
}
capture_window
Capture a specific application window.
Parameters:
app_name(required): Name of the application (e.g., "Safari", "Terminal")save_path(optional): Custom save path
Example:
{
"name": "capture_window",
"arguments": {
"app_name": "Safari",
"save_path": "/path/to/save/window.png"
}
}
capture_region
Capture a specific region of the screen.
Parameters:
x(required): X coordinatey(required): Y coordinatewidth(required): Width of regionheight(required): Height of regionsave_path(optional): Custom save path
extract_text_from_screen
Capture screen and extract text using OCR.
Parameters:
region(optional): Specific region to capturex,y,width,height: Region coordinates
find_text_on_screen
Find text on screen and return its location.
Parameters:
text(required): Text to search forcase_sensitive(optional): Whether search should be case sensitive (default: false)
get_window_list
Get list of all open windows with their positions.
get_screen_info
Get information about available screens/displays.
click_at_position
Click at a specific screen position.
Parameters:
x(required): X coordinatey(required): Y coordinatebutton(optional): Mouse button ('left', 'right', 'middle', default: 'left')double_click(optional): Whether to double-click (default: false)
monitor_screen_region
Monitor a screen region for changes over time.
Parameters:
x,y,width,height(required): Region to monitorduration_seconds(optional): How long to monitor (max 30 seconds, default: 5)interval_ms(optional): Check interval in milliseconds (default: 1000)
Requirements
- macOS (uses native
screencapturecommand) - Node.js 16+
- Claude Desktop with MCP support
- Screen recording permissions for automation features
Permissions
On first use, macOS may request permissions for:
- Screen recording
- Accessibility (for clicking automation)
- File system access (for saving screenshots)
Grant these permissions in System Preferences > Security & Privacy.
Screenshots Storage
Screenshots are automatically saved to a screenshots/ directory within the server folder. You can:
- Access screenshots via the resource URI system
- Specify custom save paths for individual captures
- View saved screenshots through Claude's resource system
Development
# Install dependencies
npm install
# Start development server
npm run dev
# Run tests
npm test
Advanced Features
OCR Integration
The server includes hooks for macOS Vision framework integration for advanced OCR capabilities. Full OCR requires additional setup with native macOS Vision APIs.
Automation
The clicking and monitoring features enable automation workflows when combined with other MCP servers.
Security
- All screen captures require explicit permission
- File system access is controlled by macOS permissions
- No network access required for core functionality
License
MIT License - see LICENSE file for details
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Support
For issues and questions, please use the GitHub Issues page.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.