Mobile E2E MCP
AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents with 66 MCP tools and an Explorer for automatic page traversal.
README
Mobile E2E MCP (2026)
AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents, with deterministic-first Android/iOS execution, bounded visual fallback, and evidence-rich outcomes.
This repository is a pnpm monorepo that combines MCP tooling, adapter execution, and architecture docs for AI agents that need to inspect, act on, and debug mobile apps without turning raw device commands into ungoverned side effects.
Quick Start
{
"mcpServers": {
"mobile-e2e-mcp": {
"command": "npx",
"args": ["-y", "@shenyuexin/mobile-e2e-mcp@latest"]
}
}
}
Once installed, you get 66 MCP tools for governed mobile automation, plus a built-in Explorer for automatic page traversal.
Primary Product Surface: Explorer
Explorer is the main outward-facing product capability of this repository. It is not a standalone crawler bolted onto the side; it is the clearest product surface for the harness because broad app exploration requires the same core MCP capabilities a mobile AI agent needs in practice: device discovery, auditable sessions, UI inspection, bounded UI actions, rule-based risk gates, interruption/recovery handling, and structured evidence.
Use Explorer when you want to answer product questions such as:
- Which screens are reachable from this app entry point?
- Which flows are blocked by policy, external-app boundaries, risk gates, or repeated failures?
- What changed between two exploration runs?
- Which discovered paths should be promoted into deterministic replay or PR review evidence?
Explorer writes fixed, reviewable artifacts such as tree.txt, report.md, summary.json, config.json, and failure-review JSON/Markdown. Current local and tracked evidence includes large Settings explorations, including 100+ page runs, with rule decisions, interruption/failure context, and machine-consumable page metadata.
Core Wedge: Governed Agent Control
The strongest use case for this project is not "replace every mobile E2E framework." It is: give an AI agent a safer control plane for mobile devices.
Compared with a thin adb or platform-command wrapper, this harness adds:
- Policy boundaries: actions are checked against policy profiles before execution.
- Auditable sessions: actions run inside session, lease, audit, and evidence context.
- Capability disclosure: agents can query supported platforms and boundaries before acting.
- Structured outcomes: failures, denials, and evidence are machine-consumable instead of log-only.
Reproduce the dry-run proof:
pnpm run proof:governed-agent-mobile-control
The proof writes a timestamped bundle under output/showcase/governed-agent-mobile-control/<run-id>/ and verifies that a read-only session blocks an interactive action with structured POLICY_DENIED. See Governed Agent Mobile Control Proof.
Explorer: Automatic Page Traversal
Explorer is a DFS-based automatic page traversal engine built into the MCP server. It systematically navigates through your app's screens, builds a state graph, and produces structured coverage reports without requiring manual flow definitions.
npx -y @shenyuexin/mobile-e2e-mcp@latest explore \
--app-id com.example.app \
--platform android \
--output ./explore-report
Key features:
- DFS-based traversal: systematically explores every reachable screen from a starting point
- State graph tracking: records visited states and detects cycles to avoid infinite loops
- Circuit breaker: automatically stops when exploration hits diminishing returns or configured limits
- Structured coverage reports: outputs machine-consumable reports showing which screens and elements were discovered
- Rule-based gating: respects skip-page, skip-element, sampling, and risk-gating rules for safe exploration
- Interruption-aware evidence: records blocked, interrupted, skipped, and failed traversal decisions with reasons so a run can be reviewed instead of treated as a raw pass/fail crawl
- Experimental horizontal fallback: after vertical segments are exhausted, Explorer can probe horizontally scrollable content with bounded page-identity checks
Output
Explorer produces a directory of structured artifacts:
| File | Description |
|---|---|
tree.txt |
ASCII tree of all discovered pages and navigation paths |
report.md |
Human-readable coverage report with module breakdown |
failure-review.md |
Human-readable failure triage with grouped patterns and suggested next actions |
failure-review.json |
Machine-consumable failure triage summary |
summary.json |
Machine-consumable metrics and page metadata |
config.json |
Runtime configuration and rule settings used for the run |
Example output from a real run against iOS Settings (181 pages, max depth 5):
tree.txt— full page hierarchyreport.md— module breakdown and pathssummary.json— metrics and metadata
Local real-device runs are written under output/evidence/explorer/. These outputs are intentionally structured so they can be curated into public showcase evidence or consumed by follow-on tooling such as coverage diffing, PR summaries, and replay path extraction.
For architecture details and rule configuration:
What This Repository Actually Is
This repo contains both:
- Executable implementation (MCP server, adapters, contracts, core orchestration), and
- Architecture and delivery knowledge base (design principles, capability model, phased rollout docs).
If you only remember one thing: this project is designed as an Explorer-led, governed mobile control layer for AI agents, not a single-framework test runner.
Mobile E2E Harness Positioning
This project is an AI mobile E2E harness: a policy-aware, session-oriented, deterministic-first execution harness for mobile automation where an AI agent needs controlled action, evidence, and support-boundary clarity.
If you're searching for terms like mobile test harness, real-device Android test harness, AI automation harness, or mobile CI harness, this repository is built for that exact workflow.
Why teams use this harness
- Deterministic-first harness: stable selectors and structured retries before OCR/CV fallback
- Failure-intelligence harness: reason codes, evidence artifacts, and remediation suggestions
- Governance-aware harness: policy profiles, auditable sessions, and controlled tool surfaces
- Explorer harness: the primary product surface, combining traversal, tool orchestration, risk gating, interruption handling, recovery, and structured coverage/failure evidence (available via CLI)
- Real-device evidence: Explorer/probe artifacts plus historical videos for happy path and interruption recovery
Capability Showcase
If you want a quick hands-on tour before diving into architecture details, start here:
- Happy path video (login -> scroll -> add to cart -> orders -> cart):
docs/showcase/videos/m2e-happy-path-scroll-pause-40s.mp4
- Visible interruption + recovery video (HOME interruption -> recover_to_known_state -> continue action):
docs/showcase/videos/m2e-interruption-home-recovery-35s.mp4
- Current real-device verification:
- Android Explorer evidence:
docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/ - Android probe entrypoint:
pnpm run validate:android-tool-probe(latest Vivo V2405A run: 20/23 success, 0 partial, 3 expected diagnostic failures; core UI and interruption-resume paths passed) - iOS probe entrypoint:
pnpm run validate:ios-tool-probe
- Android Explorer evidence:
- Governed-control proof:
pnpm run quickstart:governed-control(first-run readiness and next-command guide)pnpm run proof:governed-agent-mobile-controlpnpm run proof:governed-agent-mobile-control:preflight(checks Android live-proof readiness)pnpm run proof:governed-agent-mobile-control:live(requires an Android device/emulator)pnpm run proof:governed-business-app-workflow(installs/launches the demo app, then switches to read-only governed agent observation)pnpm run proof:governed-policy-escalation(denies under read-only, then retries under interactive policy)pnpm run validate:governed-control-evidencepnpm run validate:governed-business-app-evidencepnpm run validate:governed-business-app-comparisonpnpm run validate:governed-policy-escalation-evidencepnpm run validate:governed-evidence-briefpnpm run validate:governed-pr-evidence-summarypnpm run verify:mobile-change(one-command mobile change verification UX; use-- --live --contract=configs/readiness/mobile-change.android.jsonfor contract-backed live mode)pnpm run generate:mobile-change-readiness-contract/pnpm run validate:mobile-change-readiness-contractpnpm run generate:mobile-change-repo-app-success-candidate/pnpm run validate:mobile-change-repo-app-success-candidate(repo-owned demo app success candidate; blocked output is not success evidence until a device/emulator run passes intake)pnpm run generate:mobile-change-ci-pr-evidence/pnpm run validate:mobile-change-ci-pr-evidence(compact PR/CI artifact with proof-level-safe blocked/failed/success labels)pnpm run generate:mobile-change-failure-memory/pnpm run validate:mobile-change-failure-memory(deterministic failure-pattern grouping and bounded next-action routing)pnpm run verify:react-native-change(experimental RN lane that runs readiness plus evidence-pack orchestration; live success still requires device, Metro, debug target, stable selectors, and intake-backed proof)pnpm run generate:react-native-readiness/pnpm run validate:react-native-readiness(RN preflight for device, Metro, JS debug target, readiness contract, and stable selectors)pnpm run generate:react-native-evidence-pack/pnpm run validate:react-native-evidence-pack(RN review artifact that keeps Metro signals supplemental)pnpm run proof:mobile-change-verification(fixture-backed mobile change verification bundle, failure packet, and scenario index)pnpm run generate:mobile-change-device-readiness/pnpm run validate:mobile-change-device-readiness(structured device/app/readiness preflight before attempting live mobile change proof)pnpm run proof:mobile-change-verification:live(optional live device/emulator proof; useM2E_LIVE_MOBILE_CHANGE_ALLOW_NO_DEVICE=1for structured no-device output)pnpm run proof:mobile-change-verification:live-settings(runnable no-APK Android Settings success lane; requires10AEA40Z3Y000R5or editing the device id)pnpm run generate:mobile-change-live-settings-lane/pnpm run validate:mobile-change-live-settings-lanepnpm run proof:mobile-change-verification:readiness-failure(controlled live-runner-derived app readiness failure packet)pnpm run validate:mobile-change-verificationpnpm run validate:mobile-change-live-android-evidence(tracked Android device10AEA40Z3Y000R5live app-readiness failure evidence)pnpm run validate:mobile-change-readiness-failurepnpm run generate:mobile-change-handoff/pnpm run validate:mobile-change-handoffpnpm run intake:mobile-change-live-proof/pnpm run validate:mobile-change-live-proof-intake(review live runner output before promoting it as tracked evidence)- docs/showcase/governed-agent-mobile-control.md
- docs/showcase/governed-agent-mobile-control-live.md
- docs/showcase/governed-business-app-workflow.md
- docs/showcase/governed-policy-escalation.md
- docs/showcase/governed-quickstart.md
- docs/showcase/evidence/governed-control-brief/brief.md
- docs/showcase/evidence/governed-control-brief/pr-comment.md
- docs/showcase/evidence/governed-control-vivo-2026-05-23/report.md
- docs/showcase/evidence/governed-business-app-vivo-2026-05-24/report.md
- docs/showcase/evidence/governed-business-app-vivo-2026-05-24/comparison.md
- docs/showcase/evidence/governed-policy-escalation-dry-run-2026-05-25/report.md
- docs/showcase/evidence/mobile-change-verification-fixture/report.md
- docs/showcase/evidence/mobile-change-verification-fixture/failure-packet.md
- docs/showcase/evidence/mobile-change-verification-fixture/scenario-index.md
- docs/showcase/evidence/mobile-change-device-readiness/report.md
- docs/showcase/evidence/mobile-change-live-android-10AEA40Z3Y000R5/report.md
- docs/showcase/evidence/mobile-change-live-settings-lane/lane.md
- docs/showcase/evidence/mobile-change-live-proof-intake/intake.md
- docs/showcase/evidence/mobile-change-repo-app-success-candidate/candidate.md
- docs/showcase/evidence/mobile-change-ci-pr-evidence/pr-summary.md
- docs/showcase/evidence/mobile-change-failure-memory/remediation.md
- docs/showcase/evidence/mobile-change-readiness-failure/failure-packet.md
- docs/showcase/evidence/mobile-change-readiness-failure/handoff.md
- Historical demo scripts:
bash scripts/legacy/dev/record-demo-happy-path-android.shbash scripts/legacy/dev/record-demo-interruption-home-recovery-android.shbash scripts/legacy/dev/publish-showcase-assets-android.sh(record + curate videos + refresh snapshots/GIFs)
- Demo playbook and evidence index:
- AI invocation and task guides:
- CI evidence and boundary notes:
Quick GIF Preview
| Happy path GIF | Interruption recovery GIF |
|---|---|
![]() |
![]() |
FAQ
What is a mobile E2E harness for AI agents?
It is an execution layer that lets AI agents run mobile test actions safely and reproducibly. This harness adds session control, policy boundaries, deterministic action routing, and structured evidence beyond basic command execution.
Can this harness run on real Android devices?
Yes. Current real-device evidence is centered on Explorer/probe artifacts, with historical showcase scripts under scripts/legacy/dev/* and recordings under docs/showcase/*. The latest Android Vivo probe verified the core Settings UI action path and resume_interrupted_action with native_android; a few diagnostic/negative-path checks remain intentionally non-green unless their prerequisites, such as Metro, are present.
How does interruption recovery work in this harness?
It detects interruption signals, classifies likely interruption type, and applies bounded recovery actions (for example recover_to_known_state) before continuing the flow.
Is this a replacement for Appium or Maestro?
Not necessarily. It is better understood as an orchestration harness that can coexist with existing execution ecosystems while adding AI-oriented governance and diagnostics.
Which scenarios are the best fit?
AI agents that need safe mobile device access, release-gate mobile regression, flaky-flow triage, AI-driven exploratory checks, and real-device CI workflows that require auditable, evidence-rich outcomes.
What is the Explorer and when should I use it?
Explorer automatically traverses your app's screens without predefined flows. Use it when you need broad coverage discovery, want to map an unfamiliar app's navigation structure, or need to identify all reachable screens before writing targeted test flows. It is available via the explore CLI command.
Android physical-device Explorer evidence is tracked under docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/: a full Settings traversal completed in 33m 50s with 45 pages, max depth 4, and 0 failures.
Validate that evidence offline with pnpm run validate:explorer-android-evidence -- --min-pages 45 --min-depth 4.
Appium / Maestro vs This Harness
| Dimension | Appium / Maestro | Mobile E2E MCP Harness |
|---|---|---|
| Core role | Automation framework / flow runner | AI-facing orchestration harness |
| Execution strategy | Action execution centric | Deterministic-first + policy/session governance |
| Failure handling | Assertion/command failure outputs | Structured diagnostics + ranked causes + remediation hints |
| AI integration | Possible but not primary abstraction | Primary design target (tools for AI agents) |
| Evidence model | Varies by setup | Built-in evidence-first action outcomes |
| Helper app dependency | Required for iOS/Android replay | Android: owned-adb primary (no helper app needed for common commands); iOS simulator: axe CLI; iOS physical: WDA (one-time setup, see External Tools Guide) |
Official AI Mobile Tools vs This Harness
Android CLI/Journeys, Android Studio Journeys, and the Dart/Flutter MCP server are complementary upstream tools, not replacements for this harness. Treat them as source-native journey execution, authoring, or framework-context providers; feed their outputs into mobile-e2e evidence intake only when proof boundaries are explicit.
The machine-readable bridge contract is generated by:
pnpm run validate:official-tool-bridge
See docs/showcase/evidence/official-tool-bridge/bridge.md for the current relationship matrix.
How It Works (End-to-End)
Typical runtime path:
- Agent/client invokes an MCP tool via stdio or dev CLI.
- MCP server validates input and applies policy checks.
- Session context is resolved (or created), with lease/scheduling guardrails.
- Adapter router selects deterministic execution path first.
- Action executes and returns a structured result envelope.
- Artifacts/evidence (screens, logs, summaries) are attached for audit/debug.
- If deterministic resolution fails and policy allows it, bounded OCR/CV fallback is attempted.
This is why the project emphasizes session + policy + evidence, not only UI actions.
High-Level Architecture
Reference split:
- Control plane: tool contracts, policy checks, session orchestration, audit/evidence indexing
- Execution plane: platform actions, UI resolution, retries, interruption handling, visual fallback
Architecture reference:
- System architecture overview (Mermaid, in-repo)
- Reference architecture details
- Architecture navigation index (zh-CN)
Source-of-truth note:
- Architecture docs describe both current baseline and target-state design.
- If a doc statement conflicts with strict validation behavior, prefer
packages/contracts/*.schema.jsonandconfigs/policies/*.yamlfor current enforced behavior.
Capability Map (Current Scope)
- Environment & device control — discovery, lease/isolation, environment shaping
- App lifecycle — install/launch/terminate/reset/deep-link entry
- Perception & interaction — inspect/query UI, tap/type/wait, flow execution
- Diagnostics & evidence — logs, crash signals, performance, screenshot/timeline artifacts
- Reliability & remediation — reason-coded failures, bounded retries, remediation helpers
Tool registry/signature dispatch live in packages/mcp-server/src/server.ts, while descriptor metadata and wrapper composition live in packages/mcp-server/src/index.ts.
Complete MCP Tool Catalog (Current)
The server currently exposes 66 tools. For AI agents, this is the current tool surface.
1) Session & lifecycle
start_session, request_manual_handoff, end_session, run_flow, reset_app_state
2) Task orchestration & flow capture
execute_intent, complete_task, start_record_session, get_record_session_status, end_record_session, cancel_record_session, export_session_flow, record_task_flow, validate_flow
3) Device & app control
list_devices, install_app, launch_app, terminate_app, describe_capabilities, doctor
4) UI perception, targeting, and interaction
inspect_ui, query_ui, resolve_ui_target, scroll_only, scroll_and_resolve_ui_target, wait_for_ui, wait_for_ui_stable, tap, tap_element, scroll_and_tap_element, type_text, type_into_element, navigate_back
5) Evidence, observability, and diagnostics
take_screenshot, record_screen, get_logs, get_crash_signals, collect_diagnostics, collect_debug_evidence, get_screen_summary, get_session_state, get_page_context, capture_js_console_logs, capture_js_network_events, list_js_debug_targets, capture_element_screenshot, compare_visual_baseline
6) Interruption handling
detect_interruption, classify_interruption, resolve_interruption, resume_interrupted_action
7) Failure analysis, recovery, and remediation
perform_action_with_evidence, get_action_outcome, explain_last_failure, rank_failure_candidates, find_similar_failures, compare_against_baseline, recover_to_known_state, replay_last_stable_path, suggest_known_remediation, replay_checkpoint_chain
8) Performance profiling
measure_android_performance, measure_ios_performance
9) Network diagnostics
probe_network_readiness, diagnose_network_failure, inspect_network_policy
probe_network_readiness checks runtime connectivity, DNS, latency, and optional backend reachability. diagnose_network_failure starts from an observed failed request and attributes likely Android cleartext or iOS ATS release-policy blockers. inspect_network_policy remains the lower-level static checker for plain HTTP endpoints using decoded manifest, network-security-config, Info.plist, or readable APK/IPA ZIP artifact evidence. These tools do not proxy traffic or mutate app configuration.
For exact signatures and supported inputs/outputs, use packages/mcp-server/src/server.ts (the tool registry source of truth).
Deterministic Ladder and Fallback Policy
Action resolution order is intentional and strict:
- Stable ID/resource-id/testID/accessibility identifier
- Semantic tree match (text/label/role)
- OCR text-region fallback (bounded)
- CV/template fallback (bounded)
- Fail with reason code + artifacts
Prohibited behavior:
- OCR/CV as the default first path
- Unbounded retries without state-change evidence
- Silent downgrade from deterministic to probabilistic execution
Repository-Wide Principles
- Deterministic-first: use stable IDs/tree/native capabilities first; OCR/CV is bounded fallback.
- Structured tool contracts: return machine-consumable result envelopes (
status,reasonCode, artifacts). - Session-oriented execution: actions run in auditable sessions with explicit policy profiles.
- Evidence-rich failures: failures should carry enough context for explain/replay/remediation.
Session, Policy, and Governance Model
- Sessions are auditable execution units with timeline and artifact references.
- Policy profiles can restrict tool classes (for example read-only vs interactive/full-control).
- Lease/scheduler constraints prevent unsafe concurrent execution on the same target.
- Redaction/governance paths exist to keep evidence useful while respecting data boundaries.
Key policy/config locations:
configs/policies/*.yamlconfigs/profiles/*.yaml- Policy profile guide
Non-Goals (Important for Correct Expectations)
- This is not a replacement for every mobile framework internals.
- This is not OCR-first automation.
- This does not imply separate full RN or Flutter backends, or immediate parity across all native/RN/Flutter edge cases.
- This is not a single abstraction that erases all platform differences.
Selected Docs
- README.zh-CN.md — Chinese overview
- docs/README.md — public documentation index and publication policy
- docs/guides/ai-agent-invocation.zh-CN.md — canonical AI-agent invocation guide
- docs/guides/policy-profiles.md — policy profile usage and escalation boundaries
- docs/engineering/ai-first-capability-expansion-guideline.md — feature expansion rules for AI-first harness capabilities
- docs/architecture/overview.md — goals/scope/principles
- docs/architecture/architecture.md — reference architecture
- docs/architecture/capability-map.md — capability taxonomy/maturity
- docs/architecture/governance-security.md — governance/security model
- docs/architecture/README.zh-CN.md — architecture navigation index (zh-CN)
- docs/architecture/session-orchestration-architecture.zh-CN.md — session lease/scheduler/runtime orchestration
- docs/architecture/policy-engine-runtime-architecture.zh-CN.md — policy runtime/guard/scope mapping
- docs/architecture/platform-implementation-matrix.zh-CN.md — cross-platform support matrix
- docs/delivery/roadmap.md — delivery phases
- docs/delivery/npm-release-and-git-tagging.zh-CN.md — npm release and Git tagging integration guide (includes layered doc-sync rules for PR/pre-tag/tag stages)
- docs/showcase/README.md — real-device demo evidence and repro scripts
- tests/README.md — test layers and CI scope
Roadmap Snapshot (Short)
- Near term: harden deterministic session/action reliability and evidence model.
- Mid term: broaden framework/profile maturity and real-run coverage.
- Long term: stronger agentic remediation/governance and enterprise controls.
Detailed public planning references are maintained in docs/delivery/roadmap.md and docs/architecture/*.
Open Source Collaboration
- License: MIT
- Contributing guide: CONTRIBUTING.md
- Changelog: CHANGELOG.md
Positioning
This project is not another isolated test framework. It is an AI-facing orchestration layer that routes mobile E2E actions through shared platform adapters and framework profiles, with deterministic-first behavior and strict governance boundaries.
Support This Project
If this project helps your team, you can support it by:
- Starring and sharing the repository
- Opening issues/PRs with reproducible evidence
- Sponsoring the project
Donation note:
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

