MCP Scrcpy Vision

MCP Scrcpy Vision

Provides AI agents with real-time vision and control over Android devices through screen streaming, UI automation, and fast input control via scrcpy protocol.

Category
Visit Server

README

mcp-scrcpy-vision

An MCP server that gives AI agents complete vision and control over Android devices.

Features:

  • Real-time Vision: Continuous screen streaming via scrcpy H.264 + ffmpeg
  • Fast Input Control: When streaming, input uses scrcpy control protocol (~5-10ms latency vs ~100-300ms with adb shell)
  • UI Automation: Element detection via uiautomator with tap coordinates
  • Full Input Control: Tap, swipe, long press, pinch, drag-drop, text, keycodes
  • System Access: Shell commands, file transfer, clipboard, notifications
  • Multi-device: Control multiple Android devices simultaneously
  • WiFi ADB: Connect wirelessly for untethered automation

Quick Start

1. Prerequisites

Required:

  • Node.js 18+
  • ADB (Android Platform Tools) in PATH
  • Android device with USB debugging enabled

For streaming (recommended for fast input):

  • scrcpy - download release, extract scrcpy-server file
  • ffmpeg - install and add to PATH

2. Install

git clone https://github.com/anthropics/mcp-scrcpy-vision.git
cd mcp-scrcpy-vision
npm install
npm run build

3. Configure

Create .env file:

# Required for streaming + fast input
SCRCPY_SERVER_PATH="C:\scrcpy-win64-v3.2\scrcpy-server"
SCRCPY_SERVER_VERSION="3.2"

# Optional (defaults shown)
ADB_PATH="adb"
FFMPEG_PATH="ffmpeg"
DEFAULT_MAX_SIZE="1024"
DEFAULT_MAX_FPS="30"
DEFAULT_FRAME_FPS="2"

4. Add to MCP Client

Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "android": {
      "command": "node",
      "args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
      "env": {
        "SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
        "SCRCPY_SERVER_VERSION": "3.2"
      }
    }
  }
}

Cursor (Settings > MCP):

{
  "android": {
    "command": "node",
    "args": ["C:/path/to/mcp-scrcpy-vision/dist/index.js"],
    "env": {
      "SCRCPY_SERVER_PATH": "C:/scrcpy/scrcpy-server",
      "SCRCPY_SERVER_VERSION": "3.2"
    }
  }
}

5. Connect Device

  1. Enable USB debugging on Android device (Settings > Developer Options > USB Debugging)
  2. Connect via USB
  3. Accept RSA fingerprint prompt on device
  4. Verify: adb devices should show your device

How It Works

Two Modes of Operation

1. Snapshot Mode (No streaming required)

  • Uses android.vision.snapshot for screenshots
  • Input uses ADB shell commands (~100-300ms per action)
  • Works without scrcpy/ffmpeg
  • Best for simple automation or when streaming isn't available

2. Streaming Mode (Recommended)

  • Start with android.vision.startStream
  • Continuous JPEG frames available via resource URI
  • Input uses scrcpy control protocol (~5-10ms per action)
  • 10-20x faster than snapshot mode
  • Best for real-time control and rapid interactions

Performance Comparison

Operation Snapshot Mode Streaming Mode
Tap ~100-300ms ~5-10ms
Swipe ~300-500ms ~50-100ms
Type text ~50ms/char ~5ms total
Screenshot ~500ms ~33ms (30fps)

Tools Reference (32 tools)

Device Management

Tool Parameters Description
android.devices.list - List connected devices
android.devices.info serial Get device info (model, SDK, etc.)
android.adb.enableTcpip serial, port? Enable WiFi debugging
android.adb.getDeviceIp serial Get device WiFi IP
android.adb.connectWifi ipAddress, port? Connect via WiFi
android.adb.disconnectWifi ipAddress? Disconnect WiFi

Vision

Tool Parameters Description
android.vision.startStream serial, maxSize?, maxFps?, frameFps? Start continuous stream (enables fast input)
android.vision.stopStream serial Stop stream
android.vision.snapshot serial Take PNG screenshot (works without streaming)
android.ui.dump serial Get UI hierarchy XML
android.ui.findElement serial, text?, resourceId?, className?, contentDesc? Find elements with tap coords

Input Control

Note: These automatically use fast scrcpy control when streaming, otherwise fall back to ADB.

Tool Parameters Description
android.input.tap serial, x, y Tap at coordinates
android.input.swipe serial, x1, y1, x2, y2, durationMs? Swipe gesture
android.input.longPress serial, x, y, durationMs? Long press
android.input.pinch serial, centerX, centerY, startDistance, endDistance, durationMs? Pinch zoom
android.input.dragDrop serial, startX, startY, endX, endY, durationMs? Drag and drop
android.input.text serial, text Type text
android.input.keyevent serial, keycode Send keycode

App Control

Tool Parameters Description
android.app.start serial, packageName, activity? Launch app
android.app.stop serial, packageName Force-stop app
android.apps.list serial, system? List installed apps
android.activity.current serial Get foreground activity

System

Tool Parameters Description
android.shell.exec serial, command Execute shell command
android.file.push serial, localPath, remotePath Push file to device
android.file.pull serial, remotePath, localPath Pull file from device
android.file.list serial, path List directory
android.clipboard.get serial Get clipboard
android.clipboard.set serial, text Set clipboard
android.notifications.get serial Get notifications

Screen Control

Tool Parameters Description
android.screen.wake serial Wake screen
android.screen.sleep serial Sleep screen
android.screen.isOn serial Check if screen is on
android.screen.unlock serial Unlock (unsecured only)

Resources

The server exposes these MCP resources:

  • android://devices - JSON list of connected devices
  • android://device/<serial>/frame/latest.jpg - Latest JPEG frame (when streaming)

Usage Examples

Basic Automation Loop (Streaming Mode)

1. Start stream: android.vision.startStream { serial: "ABC123" }
2. Read resource: android://device/ABC123/frame/latest.jpg
3. AI analyzes image, decides to tap "Login" button
4. Find element: android.ui.findElement { serial: "ABC123", text: "Login" }
5. Tap at returned coordinates: android.input.tap { serial: "ABC123", x: 540, y: 1200 }
6. Wait 500ms, read resource again, repeat
7. When done: android.vision.stopStream { serial: "ABC123" }

Simple Screenshot Mode

1. Take screenshot: android.vision.snapshot { serial: "ABC123" }
2. AI analyzes image
3. Find and tap: android.ui.findElement + android.input.tap
4. Take another screenshot to verify

WiFi Connection Workflow

1. Connect device via USB
2. android.adb.enableTcpip { serial: "ABC123" }
3. android.adb.getDeviceIp { serial: "ABC123" } → "192.168.1.50"
4. Disconnect USB cable
5. android.adb.connectWifi { ipAddress: "192.168.1.50" }
6. Now use "192.168.1.50:5555" as serial for all commands

App Testing Example

1. android.app.start { serial: "ABC123", packageName: "com.example.app" }
2. android.vision.startStream { serial: "ABC123" }
3. Wait for app to load, read frame
4. android.ui.findElement { serial: "ABC123", resourceId: "username_field" }
5. android.input.tap { serial: "ABC123", x: 540, y: 300 }
6. android.input.text { serial: "ABC123", text: "testuser@example.com" }
7. android.input.keyevent { serial: "ABC123", keycode: 66 }  // Enter
8. Read frame, verify login succeeded
9. android.vision.stopStream { serial: "ABC123" }

Common Keycodes

Key Code Key Code
HOME 3 BACK 4
VOLUME_UP 24 VOLUME_DOWN 25
POWER 26 ENTER 66
DELETE 67 TAB 61
MENU 82 APP_SWITCH 187
WAKEUP 224 SLEEP 223

Troubleshooting

No devices found

adb kill-server
adb start-server
adb devices

Ensure USB debugging is enabled and RSA fingerprint accepted.

Scrcpy version mismatch

SCRCPY_SERVER_VERSION must exactly match your scrcpy-server file. Check the scrcpy release version you downloaded.

ffmpeg not found

  • Windows: Download from https://ffmpeg.org/download.html, extract, add bin folder to PATH
  • macOS: brew install ffmpeg
  • Linux: apt install ffmpeg or yum install ffmpeg

Or set FFMPEG_PATH in .env to the full path.

uiautomator dump fails

Some devices need screen on. Try android.screen.wake first.

Clipboard not working (Android 10+)

Android 10+ restricts clipboard access. Use UI automation to paste instead.

Stream won't start

  1. Check scrcpy-server path is correct
  2. Verify version numbers match
  3. Try running scrcpy standalone first to verify it works

Notes & Limitations

  • Fast input when streaming: When a stream is active, tap/swipe/text/keyevent use the scrcpy control protocol (~5-10ms latency). Without streaming, falls back to adb shell input (~100-300ms).
  • One stream per device at a time
  • Snapshot works without scrcpy - useful fallback when streaming is not needed
  • Clipboard has platform limitations on Android 10+
  • Notifications may require permissions on newer Android
  • Pinch gesture currently simulates single-finger; true multi-touch requires the streaming session

Security Warning

This MCP server provides full control over connected Android devices:

  • Execute arbitrary shell commands
  • Read/write files on device
  • Control UI and input
  • Access clipboard and notifications

Only connect devices you own and trust the AI agent.


Development

npm run dev     # Development with tsx
npm run build   # Compile TypeScript
npm start       # Run production build

See claude.md for developer documentation. See agents.md for AI agent integration guide.


License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured