MCP Scrcpy Vision
Provides AI agents with real-time vision and control over Android devices through screen streaming, UI automation, and fast input control via scrcpy protocol.
README
mcp-scrcpy-vision
An MCP server that gives AI agents complete vision and control over Android devices.
Features:
- Real-time Vision: Continuous screen streaming via scrcpy H.264 + ffmpeg
- Fast Input Control: When streaming, input uses scrcpy control protocol (~5-10ms latency vs ~100-300ms with adb shell)
- UI Automation: Element detection via uiautomator with tap coordinates
- Full Input Control: Tap, swipe, long press, pinch, drag-drop, text, keycodes
- System Access: Shell commands, file transfer, clipboard, notifications
- Multi-device: Control multiple Android devices simultaneously
- WiFi ADB: Connect wirelessly for untethered automation
Quick Start
1. Prerequisites
Required:
- Node.js 18+
- ADB (Android Platform Tools) in PATH
- Android device with USB debugging enabled
For streaming (recommended for fast input):
2. Install
git clone https://github.com/anthropics/mcp-scrcpy-vision.git
cd mcp-scrcpy-vision
npm install
npm run build
3. Configure
Create .env file:
# Required for streaming + fast input
SCRCPY_SERVER_PATH="C:\scrcpy-win64-v3.2\scrcpy-server"
SCRCPY_SERVER_VERSION="3.2"
# Optional (defaults shown)
ADB_PATH="adb"
FFMPEG_PATH="ffmpeg"
DEFAULT_MAX_SIZE="1024"
DEFAULT_MAX_FPS="30"
DEFAULT_FRAME_FPS="2"
4. Add to MCP Client
Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
}
Cursor (Settings > MCP):
{
"android": {
"command": "node",
"args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
"env": {
"SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
"SCRCPY_SERVER_VERSION": "3.2"
}
}
}
5. Connect Device
- Enable USB debugging on Android device (Settings > Developer Options > USB Debugging)
- Connect via USB
- Accept RSA fingerprint prompt on device
- Verify:
adb devicesshould show your device
How It Works
Two Modes of Operation
1. Snapshot Mode (No streaming required)
- Uses
android.vision.snapshotfor screenshots - Input uses ADB shell commands (~100-300ms per action)
- Works without scrcpy/ffmpeg
- Best for simple automation or when streaming isn't available
2. Streaming Mode (Recommended)
- Start with
android.vision.startStream - Continuous JPEG frames available via resource URI
- Input uses scrcpy control protocol (~5-10ms per action)
- 10-20x faster than snapshot mode
- Best for real-time control and rapid interactions
Performance Comparison
| Operation | Snapshot Mode | Streaming Mode |
|---|---|---|
| Tap | ~100-300ms | ~5-10ms |
| Swipe | ~300-500ms | ~50-100ms |
| Type text | ~50ms/char | ~5ms total |
| Screenshot | ~500ms | ~33ms (30fps) |
Tools Reference (32 tools)
Device Management
| Tool | Parameters | Description |
|---|---|---|
android.devices.list |
- | List connected devices |
android.devices.info |
serial |
Get device info (model, SDK, etc.) |
android.adb.enableTcpip |
serial, port? |
Enable WiFi debugging |
android.adb.getDeviceIp |
serial |
Get device WiFi IP |
android.adb.connectWifi |
ipAddress, port? |
Connect via WiFi |
android.adb.disconnectWifi |
ipAddress? |
Disconnect WiFi |
Vision
| Tool | Parameters | Description |
|---|---|---|
android.vision.startStream |
serial, maxSize?, maxFps?, frameFps? |
Start continuous stream (enables fast input) |
android.vision.stopStream |
serial |
Stop stream |
android.vision.snapshot |
serial |
Take PNG screenshot (works without streaming) |
android.ui.dump |
serial |
Get UI hierarchy XML |
android.ui.findElement |
serial, text?, resourceId?, className?, contentDesc? |
Find elements with tap coords |
Input Control
Note: These automatically use fast scrcpy control when streaming, otherwise fall back to ADB.
| Tool | Parameters | Description |
|---|---|---|
android.input.tap |
serial, x, y |
Tap at coordinates |
android.input.swipe |
serial, x1, y1, x2, y2, durationMs? |
Swipe gesture |
android.input.longPress |
serial, x, y, durationMs? |
Long press |
android.input.pinch |
serial, centerX, centerY, startDistance, endDistance, durationMs? |
Pinch zoom |
android.input.dragDrop |
serial, startX, startY, endX, endY, durationMs? |
Drag and drop |
android.input.text |
serial, text |
Type text |
android.input.keyevent |
serial, keycode |
Send keycode |
App Control
| Tool | Parameters | Description |
|---|---|---|
android.app.start |
serial, packageName, activity? |
Launch app |
android.app.stop |
serial, packageName |
Force-stop app |
android.apps.list |
serial, system? |
List installed apps |
android.activity.current |
serial |
Get foreground activity |
System
| Tool | Parameters | Description |
|---|---|---|
android.shell.exec |
serial, command |
Execute shell command |
android.file.push |
serial, localPath, remotePath |
Push file to device |
android.file.pull |
serial, remotePath, localPath |
Pull file from device |
android.file.list |
serial, path |
List directory |
android.clipboard.get |
serial |
Get clipboard |
android.clipboard.set |
serial, text |
Set clipboard |
android.notifications.get |
serial |
Get notifications |
Screen Control
| Tool | Parameters | Description |
|---|---|---|
android.screen.wake |
serial |
Wake screen |
android.screen.sleep |
serial |
Sleep screen |
android.screen.isOn |
serial |
Check if screen is on |
android.screen.unlock |
serial |
Unlock (unsecured only) |
Resources
The server exposes these MCP resources:
android://devices- JSON list of connected devicesandroid://device/<serial>/frame/latest.jpg- Latest JPEG frame (when streaming)
Usage Examples
Basic Automation Loop (Streaming Mode)
1. Start stream: android.vision.startStream { serial: "ABC123" }
2. Read resource: android://device/ABC123/frame/latest.jpg
3. AI analyzes image, decides to tap "Login" button
4. Find element: android.ui.findElement { serial: "ABC123", text: "Login" }
5. Tap at returned coordinates: android.input.tap { serial: "ABC123", x: 540, y: 1200 }
6. Wait 500ms, read resource again, repeat
7. When done: android.vision.stopStream { serial: "ABC123" }
Simple Screenshot Mode
1. Take screenshot: android.vision.snapshot { serial: "ABC123" }
2. AI analyzes image
3. Find and tap: android.ui.findElement + android.input.tap
4. Take another screenshot to verify
WiFi Connection Workflow
1. Connect device via USB
2. android.adb.enableTcpip { serial: "ABC123" }
3. android.adb.getDeviceIp { serial: "ABC123" } → "192.168.1.50"
4. Disconnect USB cable
5. android.adb.connectWifi { ipAddress: "192.168.1.50" }
6. Now use "192.168.1.50:5555" as serial for all commands
App Testing Example
1. android.app.start { serial: "ABC123", packageName: "com.example.app" }
2. android.vision.startStream { serial: "ABC123" }
3. Wait for app to load, read frame
4. android.ui.findElement { serial: "ABC123", resourceId: "username_field" }
5. android.input.tap { serial: "ABC123", x: 540, y: 300 }
6. android.input.text { serial: "ABC123", text: "testuser@example.com" }
7. android.input.keyevent { serial: "ABC123", keycode: 66 } // Enter
8. Read frame, verify login succeeded
9. android.vision.stopStream { serial: "ABC123" }
Common Keycodes
| Key | Code | Key | Code |
|---|---|---|---|
| HOME | 3 | BACK | 4 |
| VOLUME_UP | 24 | VOLUME_DOWN | 25 |
| POWER | 26 | ENTER | 66 |
| DELETE | 67 | TAB | 61 |
| MENU | 82 | APP_SWITCH | 187 |
| WAKEUP | 224 | SLEEP | 223 |
Troubleshooting
No devices found
adb kill-server
adb start-server
adb devices
Ensure USB debugging is enabled and RSA fingerprint accepted.
Scrcpy version mismatch
SCRCPY_SERVER_VERSION must exactly match your scrcpy-server file. Check the scrcpy release version you downloaded.
ffmpeg not found
- Windows: Download from https://ffmpeg.org/download.html, extract, add bin folder to PATH
- macOS:
brew install ffmpeg - Linux:
apt install ffmpegoryum install ffmpeg
Or set FFMPEG_PATH in .env to the full path.
uiautomator dump fails
Some devices need screen on. Try android.screen.wake first.
Clipboard not working (Android 10+)
Android 10+ restricts clipboard access. Use UI automation to paste instead.
Stream won't start
- Check scrcpy-server path is correct
- Verify version numbers match
- Try running scrcpy standalone first to verify it works
Notes & Limitations
- Fast input when streaming: When a stream is active, tap/swipe/text/keyevent use the scrcpy control protocol (~5-10ms latency). Without streaming, falls back to
adb shell input(~100-300ms). - One stream per device at a time
- Snapshot works without scrcpy - useful fallback when streaming is not needed
- Clipboard has platform limitations on Android 10+
- Notifications may require permissions on newer Android
- Pinch gesture currently simulates single-finger; true multi-touch requires the streaming session
Security Warning
This MCP server provides full control over connected Android devices:
- Execute arbitrary shell commands
- Read/write files on device
- Control UI and input
- Access clipboard and notifications
Only connect devices you own and trust the AI agent.
Development
npm run dev # Development with tsx
npm run build # Compile TypeScript
npm start # Run production build
See claude.md for developer documentation. See agents.md for AI agent integration guide.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
E2B
Using MCP to run code via e2b.