Screen MCP
Enables LLMs to capture screenshots and screen recordings through MCP with chunked session-based transfers for reliable image consumption. Supports multi-monitor selection, timeline capture, and compatibility with both vision and non-vision language models.
README
screen-mcp
A FastMCP server that runs on the client machine and exposes screenshot tools to a host MCP. It supports both direct screenshot capture and session-based chunked transfers so an LLM can consume images reliably.
Official FastMCP documentation: gofastmcp.com/getting-started/welcome
Exposed tools
list_monitors: returns detected monitors (index and dimensions)capture_screenshot: captures a screen image with hybrid mode (base64for non-vision, native MCPimagefor vision)capture_timeline: captures a timed screen sequence (ordered frames with timestamps)start_timeline_capture: starts a timeline session and returns atimeline_idget_timeline_manifest: returns chunked timeline metadataget_timeline_chunk: retrieves a timeline JSON chunkrelease_timeline_capture: explicitly releases a timeline sessionstart_screenshot_capture: starts a screenshot session and returns acapture_idget_screenshot_manifest: returns metadata plus ASCII preview for non-vision LLMsget_screenshot_chunk: returns a chunk of base64 image datarelease_screenshot_capture: releases the screenshot session and frees memory
Quick tool guidance
- Need available monitor info:
list_monitors - Need a fast single screenshot with moderate payload:
capture_screenshot - Need a more robust single screenshot with chunking:
start_screenshot_capture->get_screenshot_manifest->get_screenshot_chunk(0..N-1) ->release_screenshot_capture - Need a short timeline in one call:
capture_timeline - Need a robust timeline for large payloads:
start_timeline_capture->get_timeline_manifest->get_timeline_chunk(0..N-1) ->release_timeline_capture
Best practices:
- Always concatenate chunks in ascending
chunk_indexorder. - Always call
release_*after reading session data to free memory. - For non-vision models, consume
preview_textfrom the manifest before loading full payload.
Prerequisites
- Linux with an active graphical session (X11/Wayland capture support)
DISPLAYenvironment variable available to the server process (mssrequires it on Linux)- Python 3.10+
Local installation
uv sync
Or via Taskfile:
task setup
Run the MCP server (stdio)
task server
This task starts the server using mcpm run screen-mcp through uvx.
It also registers or updates the local MCP server automatically when needed.
Display-related environment variables are propagated during registration: DISPLAY, WAYLAND_DISPLAY, XAUTHORITY, XDG_RUNTIME_DIR.
MCP-compatible smoke-test client
task client
The smoke-test script is located in scripts/smoke_client.py and exercises:
list_monitorsstart_screenshot_captureget_screenshot_manifestget_screenshot_chunkrelease_screenshot_capture
It writes a verification image to artifacts/smoke_capture.jpg.
You can also run a specific action via --action:
uv run python scripts/smoke_client.py --action list-monitors
uv run python scripts/smoke_client.py --action capture-screenshot --monitor-index 0 --output artifacts/capture.jpg
uv run python scripts/smoke_client.py --action capture-timeline --duration-seconds 6 --output artifacts/timeline.json
uv run python scripts/smoke_client.py --action capture-timeline-session --duration-seconds 6 --chunk-size 120000 --output artifacts/timeline_session.json
Debugging and real-time inspection
task inspector
This launches the MCP Inspector against the mcpm run screen-mcp server.
Using the server in VS Code
- Open this project folder in VS Code.
- Add a
serversconfiguration. - Create a
.vscode/mcp.jsonfile and add one of the examples below.
Recommended local example for a cloned repo (unpublished package):
{
"servers": {
"screen-mcp": {
"type": "stdio",
"command": "uv",
"args": ["run", "--project", "/absolute/path/to/screen-mcp", "screen-mcp"]
}
}
}
Example for running directly from a Git repo without global installation:
{
"servers": {
"screen-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["--from", "git+https://github.com/<owner>/screen-mcp.git", "screen-mcp"]
}
}
}
Alternative via MCPM:
{
"servers": {
"screen-mcp": {
"type": "stdio",
"command": "uvx",
"args": ["mcpm", "run", "screen-mcp"]
}
}
}
Example tool calls
list_monitors()capture_screenshot(monitor_index=0, image_format="jpeg", max_width=1600, quality=80)capture_screenshot(monitor_index=0, image_format="jpeg", max_width=1600, quality=80, response_mode="image")capture_timeline(duration_seconds=10, monitor_index=0, image_format="jpeg", max_width=900, quality=70)start_timeline_capture(duration_seconds=10, monitor_index=0, image_format="jpeg", max_width=900, quality=70, chunk_size=120000)get_timeline_manifest(timeline_id)get_timeline_chunk(timeline_id, chunk_index)release_timeline_capture(timeline_id)
Timeline behavior in capture_timeline:
- fixed cadence:
TIMELINE_FPS(default 2 images/s, configurable in source) - maximum duration:
TIMELINE_MAX_DURATION_SECONDS(default 30s, configurable in source) - each frame includes:
frame_index,t_offset_ms,captured_at,preview_text,image_sha256,image_size_bytes temporal_hintmakes chronological order explicit for an LLM
Robust flow recommendation:
start_screenshot_capture(...)-> obtaincapture_idget_screenshot_manifest(capture_id)-> metadata +preview_textget_screenshot_chunk(capture_id, chunk_index)-> reassemble chunksrelease_screenshot_capture(capture_id)
Base64 notes
- For multi-client MCP, base64 is the most interoperable format: simple, JSON-friendly, compatible with vision and non-vision clients.
- Tradeoff: larger payload (~33%) and risk of single-block truncation.
- This project uses session-based chunked base64 transfer (
capture_id) to make large exchanges reliable. - For non-vision LLMs, prefer
get_screenshot_manifest(metadata + ASCII preview) before downloading the full image.
Hybrid mode in capture_screenshot:
response_mode="base64"(default): legacy behavior, JSON output withimage_base64.response_mode="image": native MCP image output for vision models, with metadata instructured_content.response_mode="auto": readsSCREEN_MCP_CAPTURE_RESPONSE_MODE(base64orimage) and chooses automatically based on the client/host.
Security and privacy
Screen captures may contain sensitive data. Add an explicit client-side policy for production use (consent, masking, window whitelisting, etc.).
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.