Screen Vision MCP Server

Screen Vision MCP Server

Enables Claude to capture screenshots, watch your screen in real-time, read text via OCR, and analyze video files, all running locally as an MCP server.

Category
Visit Server

README

Screen Vision MCP Server

Give Claude Code the ability to see your screen

Screen Vision lets Claude capture screenshots, watch your screen in real-time with audio transcription, analyze video files, and read text via OCR. It runs locally as an MCP server — Claude sees what you see, when you ask.

Quick Start

pip install screen-vision[ocr]

Then add to your Claude Code MCP config (.mcp.json):

{
  "mcpServers": {
    "screen-vision": {
      "command": "screen-vision-mcp"
    }
  }
}

System deps for OCR and video:

brew install tesseract   # Required for OCR (read_screen_text)
brew install ffmpeg      # Required for video analysis (analyze_video)

What You Can Say

"Take a screenshot of my screen"          → capture_screen
"Capture the Chrome window"               → capture_window
"Watch my screen for 1 minute"            → watch_screen (with audio transcription)
"Analyze the video at ~/Downloads/demo.mp4" → analyze_video
"Read the text on my screen"              → read_screen_text
"What window am I in?"                    → get_active_context (no screenshot)
"What's on my screen right now?"          → understand_screen (AI analysis)
"Analyze this photo I AirDropped"         → analyze_image

Tools (14)

Tool What it does Needs
capture_screen Full screen capture with delay + multi-monitor
capture_region Capture a specific rectangular area
capture_window Capture a window by title
list_monitors List displays with resolutions
get_active_context Window/cursor/monitor info (no image)
read_screen_text OCR text extraction from screen tesseract
understand_screen AI-powered screen analysis Anthropic API key
analyze_image Analyze a dropped/AirDropped image file
watch_screen Watch screen with frame sampling + audio ffmpeg (audio)
analyze_video Extract keyframes from video files ffmpeg
capture_camera Grab latest frame from phone camera
watch_camera Stream phone camera with scene detection + audio
show_pairing_qr Show QR code to connect phone camera
phone_status Check phone camera connection status

Security

Screen Vision includes security controls for corporate environments:

  • PII/PCI scanning — Detects credit card numbers, SSNs, phone numbers, email addresses in OCR text
  • App deny-list — Blocks captures of Slack, Teams, Zoom, banking apps, password managers
  • Call detection — Blocks captures during active audio calls
  • Rate limits — 200 captures/session, 2s minimum interval, 5min max watch duration
  • Audit logs — All captures logged to ~/.screen-vision/audit.log

Set SCREEN_VISION_MODE=work to enable all security controls. Default mode is personal (no restrictions).

Dependencies

Core (always installed): mcp[cli], mss, Pillow, numpy, httpx

Extras (mix and match):

Extra Install What you get
[ocr] pip install screen-vision[ocr] pytesseract — OCR via tesseract (~5MB, needs brew install tesseract)
[paddle] pip install screen-vision[paddle] paddleocr + opencv-python-headless — higher-accuracy OCR (~1GB, self-contained)
[audio] pip install screen-vision[audio] faster-whisper + sounddevice — audio transcription for watch_screen
[full] pip install screen-vision[full] All of the above

Python 3.11+ required.

Development

pip install -e ".[ocr,test]"
pytest tests/ -v
ruff check src/

Author

Alex Vicunagithub.com/avicuna

Contributing

Issues and PRs welcome: https://github.com/avicuna/screen-vision

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured