Mobile E2E MCP

Mobile E2E MCP

AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents with 66 MCP tools and an Explorer for automatic page traversal.

Category
Visit Server

README

Mobile E2E MCP (2026)

CI (build + typecheck + unit + smoke) Platform Smoke (iOS sim + Android emulator) Real Device Acceptance (self-hosted)

AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents, with deterministic-first Android/iOS execution, bounded visual fallback, and evidence-rich outcomes.

This repository is a pnpm monorepo that combines MCP tooling, adapter execution, and architecture docs for AI agents that need to inspect, act on, and debug mobile apps without turning raw device commands into ungoverned side effects.

Quick Start

{
  "mcpServers": {
    "mobile-e2e-mcp": {
      "command": "npx",
      "args": ["-y", "@shenyuexin/mobile-e2e-mcp@latest"]
    }
  }
}

Once installed, you get 66 MCP tools for governed mobile automation, plus a built-in Explorer for automatic page traversal.

Primary Product Surface: Explorer

Explorer is the main outward-facing product capability of this repository. It is not a standalone crawler bolted onto the side; it is the clearest product surface for the harness because broad app exploration requires the same core MCP capabilities a mobile AI agent needs in practice: device discovery, auditable sessions, UI inspection, bounded UI actions, rule-based risk gates, interruption/recovery handling, and structured evidence.

Use Explorer when you want to answer product questions such as:

  • Which screens are reachable from this app entry point?
  • Which flows are blocked by policy, external-app boundaries, risk gates, or repeated failures?
  • What changed between two exploration runs?
  • Which discovered paths should be promoted into deterministic replay or PR review evidence?

Explorer writes fixed, reviewable artifacts such as tree.txt, report.md, summary.json, config.json, and failure-review JSON/Markdown. Current local and tracked evidence includes large Settings explorations, including 100+ page runs, with rule decisions, interruption/failure context, and machine-consumable page metadata.

Core Wedge: Governed Agent Control

The strongest use case for this project is not "replace every mobile E2E framework." It is: give an AI agent a safer control plane for mobile devices.

Compared with a thin adb or platform-command wrapper, this harness adds:

  • Policy boundaries: actions are checked against policy profiles before execution.
  • Auditable sessions: actions run inside session, lease, audit, and evidence context.
  • Capability disclosure: agents can query supported platforms and boundaries before acting.
  • Structured outcomes: failures, denials, and evidence are machine-consumable instead of log-only.

Reproduce the dry-run proof:

pnpm run proof:governed-agent-mobile-control

The proof writes a timestamped bundle under output/showcase/governed-agent-mobile-control/<run-id>/ and verifies that a read-only session blocks an interactive action with structured POLICY_DENIED. See Governed Agent Mobile Control Proof.

Explorer: Automatic Page Traversal

Explorer is a DFS-based automatic page traversal engine built into the MCP server. It systematically navigates through your app's screens, builds a state graph, and produces structured coverage reports without requiring manual flow definitions.

npx -y @shenyuexin/mobile-e2e-mcp@latest explore \
  --app-id com.example.app \
  --platform android \
  --output ./explore-report

Key features:

  • DFS-based traversal: systematically explores every reachable screen from a starting point
  • State graph tracking: records visited states and detects cycles to avoid infinite loops
  • Circuit breaker: automatically stops when exploration hits diminishing returns or configured limits
  • Structured coverage reports: outputs machine-consumable reports showing which screens and elements were discovered
  • Rule-based gating: respects skip-page, skip-element, sampling, and risk-gating rules for safe exploration
  • Interruption-aware evidence: records blocked, interrupted, skipped, and failed traversal decisions with reasons so a run can be reviewed instead of treated as a raw pass/fail crawl
  • Experimental horizontal fallback: after vertical segments are exhausted, Explorer can probe horizontally scrollable content with bounded page-identity checks

Output

Explorer produces a directory of structured artifacts:

File Description
tree.txt ASCII tree of all discovered pages and navigation paths
report.md Human-readable coverage report with module breakdown
failure-review.md Human-readable failure triage with grouped patterns and suggested next actions
failure-review.json Machine-consumable failure triage summary
summary.json Machine-consumable metrics and page metadata
config.json Runtime configuration and rule settings used for the run

Example output from a real run against iOS Settings (181 pages, max depth 5):

Local real-device runs are written under output/evidence/explorer/. These outputs are intentionally structured so they can be curated into public showcase evidence or consumed by follow-on tooling such as coverage diffing, PR summaries, and replay path extraction.

For architecture details and rule configuration:

What This Repository Actually Is

This repo contains both:

  1. Executable implementation (MCP server, adapters, contracts, core orchestration), and
  2. Architecture and delivery knowledge base (design principles, capability model, phased rollout docs).

If you only remember one thing: this project is designed as an Explorer-led, governed mobile control layer for AI agents, not a single-framework test runner.

Mobile E2E Harness Positioning

This project is an AI mobile E2E harness: a policy-aware, session-oriented, deterministic-first execution harness for mobile automation where an AI agent needs controlled action, evidence, and support-boundary clarity.

If you're searching for terms like mobile test harness, real-device Android test harness, AI automation harness, or mobile CI harness, this repository is built for that exact workflow.

Why teams use this harness

  • Deterministic-first harness: stable selectors and structured retries before OCR/CV fallback
  • Failure-intelligence harness: reason codes, evidence artifacts, and remediation suggestions
  • Governance-aware harness: policy profiles, auditable sessions, and controlled tool surfaces
  • Explorer harness: the primary product surface, combining traversal, tool orchestration, risk gating, interruption handling, recovery, and structured coverage/failure evidence (available via CLI)
  • Real-device evidence: Explorer/probe artifacts plus historical videos for happy path and interruption recovery

Capability Showcase

If you want a quick hands-on tour before diving into architecture details, start here:

Quick GIF Preview

Happy path GIF Interruption recovery GIF
Happy path preview Interruption recovery preview

FAQ

What is a mobile E2E harness for AI agents?

It is an execution layer that lets AI agents run mobile test actions safely and reproducibly. This harness adds session control, policy boundaries, deterministic action routing, and structured evidence beyond basic command execution.

Can this harness run on real Android devices?

Yes. Current real-device evidence is centered on Explorer/probe artifacts, with historical showcase scripts under scripts/legacy/dev/* and recordings under docs/showcase/*. The latest Android Vivo probe verified the core Settings UI action path and resume_interrupted_action with native_android; a few diagnostic/negative-path checks remain intentionally non-green unless their prerequisites, such as Metro, are present.

How does interruption recovery work in this harness?

It detects interruption signals, classifies likely interruption type, and applies bounded recovery actions (for example recover_to_known_state) before continuing the flow.

Is this a replacement for Appium or Maestro?

Not necessarily. It is better understood as an orchestration harness that can coexist with existing execution ecosystems while adding AI-oriented governance and diagnostics.

Which scenarios are the best fit?

AI agents that need safe mobile device access, release-gate mobile regression, flaky-flow triage, AI-driven exploratory checks, and real-device CI workflows that require auditable, evidence-rich outcomes.

What is the Explorer and when should I use it?

Explorer automatically traverses your app's screens without predefined flows. Use it when you need broad coverage discovery, want to map an unfamiliar app's navigation structure, or need to identify all reachable screens before writing targeted test flows. It is available via the explore CLI command.

Android physical-device Explorer evidence is tracked under docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/: a full Settings traversal completed in 33m 50s with 45 pages, max depth 4, and 0 failures.

Validate that evidence offline with pnpm run validate:explorer-android-evidence -- --min-pages 45 --min-depth 4.

Appium / Maestro vs This Harness

Dimension Appium / Maestro Mobile E2E MCP Harness
Core role Automation framework / flow runner AI-facing orchestration harness
Execution strategy Action execution centric Deterministic-first + policy/session governance
Failure handling Assertion/command failure outputs Structured diagnostics + ranked causes + remediation hints
AI integration Possible but not primary abstraction Primary design target (tools for AI agents)
Evidence model Varies by setup Built-in evidence-first action outcomes
Helper app dependency Required for iOS/Android replay Android: owned-adb primary (no helper app needed for common commands); iOS simulator: axe CLI; iOS physical: WDA (one-time setup, see External Tools Guide)

Official AI Mobile Tools vs This Harness

Android CLI/Journeys, Android Studio Journeys, and the Dart/Flutter MCP server are complementary upstream tools, not replacements for this harness. Treat them as source-native journey execution, authoring, or framework-context providers; feed their outputs into mobile-e2e evidence intake only when proof boundaries are explicit.

The machine-readable bridge contract is generated by:

pnpm run validate:official-tool-bridge

See docs/showcase/evidence/official-tool-bridge/bridge.md for the current relationship matrix.

How It Works (End-to-End)

Typical runtime path:

  1. Agent/client invokes an MCP tool via stdio or dev CLI.
  2. MCP server validates input and applies policy checks.
  3. Session context is resolved (or created), with lease/scheduling guardrails.
  4. Adapter router selects deterministic execution path first.
  5. Action executes and returns a structured result envelope.
  6. Artifacts/evidence (screens, logs, summaries) are attached for audit/debug.
  7. If deterministic resolution fails and policy allows it, bounded OCR/CV fallback is attempted.

This is why the project emphasizes session + policy + evidence, not only UI actions.

High-Level Architecture

Reference split:

  • Control plane: tool contracts, policy checks, session orchestration, audit/evidence indexing
  • Execution plane: platform actions, UI resolution, retries, interruption handling, visual fallback

Architecture reference:

Source-of-truth note:

  • Architecture docs describe both current baseline and target-state design.
  • If a doc statement conflicts with strict validation behavior, prefer packages/contracts/*.schema.json and configs/policies/*.yaml for current enforced behavior.

Capability Map (Current Scope)

  • Environment & device control — discovery, lease/isolation, environment shaping
  • App lifecycle — install/launch/terminate/reset/deep-link entry
  • Perception & interaction — inspect/query UI, tap/type/wait, flow execution
  • Diagnostics & evidence — logs, crash signals, performance, screenshot/timeline artifacts
  • Reliability & remediation — reason-coded failures, bounded retries, remediation helpers

Tool registry/signature dispatch live in packages/mcp-server/src/server.ts, while descriptor metadata and wrapper composition live in packages/mcp-server/src/index.ts.

Complete MCP Tool Catalog (Current)

The server currently exposes 66 tools. For AI agents, this is the current tool surface.

1) Session & lifecycle

start_session, request_manual_handoff, end_session, run_flow, reset_app_state

2) Task orchestration & flow capture

execute_intent, complete_task, start_record_session, get_record_session_status, end_record_session, cancel_record_session, export_session_flow, record_task_flow, validate_flow

3) Device & app control

list_devices, install_app, launch_app, terminate_app, describe_capabilities, doctor

4) UI perception, targeting, and interaction

inspect_ui, query_ui, resolve_ui_target, scroll_only, scroll_and_resolve_ui_target, wait_for_ui, wait_for_ui_stable, tap, tap_element, scroll_and_tap_element, type_text, type_into_element, navigate_back

5) Evidence, observability, and diagnostics

take_screenshot, record_screen, get_logs, get_crash_signals, collect_diagnostics, collect_debug_evidence, get_screen_summary, get_session_state, get_page_context, capture_js_console_logs, capture_js_network_events, list_js_debug_targets, capture_element_screenshot, compare_visual_baseline

6) Interruption handling

detect_interruption, classify_interruption, resolve_interruption, resume_interrupted_action

7) Failure analysis, recovery, and remediation

perform_action_with_evidence, get_action_outcome, explain_last_failure, rank_failure_candidates, find_similar_failures, compare_against_baseline, recover_to_known_state, replay_last_stable_path, suggest_known_remediation, replay_checkpoint_chain

8) Performance profiling

measure_android_performance, measure_ios_performance

9) Network diagnostics

probe_network_readiness, diagnose_network_failure, inspect_network_policy

probe_network_readiness checks runtime connectivity, DNS, latency, and optional backend reachability. diagnose_network_failure starts from an observed failed request and attributes likely Android cleartext or iOS ATS release-policy blockers. inspect_network_policy remains the lower-level static checker for plain HTTP endpoints using decoded manifest, network-security-config, Info.plist, or readable APK/IPA ZIP artifact evidence. These tools do not proxy traffic or mutate app configuration.

For exact signatures and supported inputs/outputs, use packages/mcp-server/src/server.ts (the tool registry source of truth).

Deterministic Ladder and Fallback Policy

Action resolution order is intentional and strict:

  1. Stable ID/resource-id/testID/accessibility identifier
  2. Semantic tree match (text/label/role)
  3. OCR text-region fallback (bounded)
  4. CV/template fallback (bounded)
  5. Fail with reason code + artifacts

Prohibited behavior:

  • OCR/CV as the default first path
  • Unbounded retries without state-change evidence
  • Silent downgrade from deterministic to probabilistic execution

Repository-Wide Principles

  • Deterministic-first: use stable IDs/tree/native capabilities first; OCR/CV is bounded fallback.
  • Structured tool contracts: return machine-consumable result envelopes (status, reasonCode, artifacts).
  • Session-oriented execution: actions run in auditable sessions with explicit policy profiles.
  • Evidence-rich failures: failures should carry enough context for explain/replay/remediation.

Session, Policy, and Governance Model

  • Sessions are auditable execution units with timeline and artifact references.
  • Policy profiles can restrict tool classes (for example read-only vs interactive/full-control).
  • Lease/scheduler constraints prevent unsafe concurrent execution on the same target.
  • Redaction/governance paths exist to keep evidence useful while respecting data boundaries.

Key policy/config locations:

Non-Goals (Important for Correct Expectations)

  • This is not a replacement for every mobile framework internals.
  • This is not OCR-first automation.
  • This does not imply separate full RN or Flutter backends, or immediate parity across all native/RN/Flutter edge cases.
  • This is not a single abstraction that erases all platform differences.

Selected Docs

Roadmap Snapshot (Short)

  • Near term: harden deterministic session/action reliability and evidence model.
  • Mid term: broaden framework/profile maturity and real-run coverage.
  • Long term: stronger agentic remediation/governance and enterprise controls.

Detailed public planning references are maintained in docs/delivery/roadmap.md and docs/architecture/*.

Open Source Collaboration

Positioning

This project is not another isolated test framework. It is an AI-facing orchestration layer that routes mobile E2E actions through shared platform adapters and framework profiles, with deterministic-first behavior and strict governance boundaries.

Support This Project

If this project helps your team, you can support it by:

  1. Starring and sharing the repository
  2. Opening issues/PRs with reproducible evidence
  3. Sponsoring the project

Donation note:

  • Donate via PayPal

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured