MCP Servers

Mobile E2E MCP

AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents with 66 MCP tools and an Explorer for automatic page traversal.

README

Mobile E2E MCP (2026)

AI-safe mobile device control via MCP: a policy-guarded, session-oriented mobile automation harness for AI agents, with deterministic-first Android/iOS execution, bounded visual fallback, and evidence-rich outcomes.

This repository is a pnpm monorepo that combines MCP tooling, adapter execution, and architecture docs for AI agents that need to inspect, act on, and debug mobile apps without turning raw device commands into ungoverned side effects.

Quick Start

{
  "mcpServers": {
    "mobile-e2e-mcp": {
      "command": "npx",
      "args": ["-y", "@shenyuexin/mobile-e2e-mcp@latest"]
    }
  }
}

Once installed, you get 66 MCP tools for governed mobile automation, plus a built-in Explorer for automatic page traversal.

Primary Product Surface: Explorer

Explorer is the main outward-facing product capability of this repository. It is not a standalone crawler bolted onto the side; it is the clearest product surface for the harness because broad app exploration requires the same core MCP capabilities a mobile AI agent needs in practice: device discovery, auditable sessions, UI inspection, bounded UI actions, rule-based risk gates, interruption/recovery handling, and structured evidence.

Use Explorer when you want to answer product questions such as:

Which screens are reachable from this app entry point?
Which flows are blocked by policy, external-app boundaries, risk gates, or repeated failures?
What changed between two exploration runs?
Which discovered paths should be promoted into deterministic replay or PR review evidence?

Explorer writes fixed, reviewable artifacts such as tree.txt, report.md, summary.json, config.json, and failure-review JSON/Markdown. Current local and tracked evidence includes large Settings explorations, including 100+ page runs, with rule decisions, interruption/failure context, and machine-consumable page metadata.

Core Wedge: Governed Agent Control

The strongest use case for this project is not "replace every mobile E2E framework." It is: give an AI agent a safer control plane for mobile devices.

Compared with a thin adb or platform-command wrapper, this harness adds:

Policy boundaries: actions are checked against policy profiles before execution.
Auditable sessions: actions run inside session, lease, audit, and evidence context.
Capability disclosure: agents can query supported platforms and boundaries before acting.
Structured outcomes: failures, denials, and evidence are machine-consumable instead of log-only.

Reproduce the dry-run proof:

pnpm run proof:governed-agent-mobile-control

The proof writes a timestamped bundle under output/showcase/governed-agent-mobile-control/<run-id>/ and verifies that a read-only session blocks an interactive action with structured POLICY_DENIED. See Governed Agent Mobile Control Proof.

Explorer: Automatic Page Traversal

Explorer is a DFS-based automatic page traversal engine built into the MCP server. It systematically navigates through your app's screens, builds a state graph, and produces structured coverage reports without requiring manual flow definitions.

npx -y @shenyuexin/mobile-e2e-mcp@latest explore \
  --app-id com.example.app \
  --platform android \
  --output ./explore-report

Key features:

DFS-based traversal: systematically explores every reachable screen from a starting point
State graph tracking: records visited states and detects cycles to avoid infinite loops
Circuit breaker: automatically stops when exploration hits diminishing returns or configured limits
Structured coverage reports: outputs machine-consumable reports showing which screens and elements were discovered
Rule-based gating: respects skip-page, skip-element, sampling, and risk-gating rules for safe exploration
Interruption-aware evidence: records blocked, interrupted, skipped, and failed traversal decisions with reasons so a run can be reviewed instead of treated as a raw pass/fail crawl
Experimental horizontal fallback: after vertical segments are exhausted, Explorer can probe horizontally scrollable content with bounded page-identity checks

Output

Explorer produces a directory of structured artifacts:

File	Description
`tree.txt`	ASCII tree of all discovered pages and navigation paths
`report.md`	Human-readable coverage report with module breakdown
`failure-review.md`	Human-readable failure triage with grouped patterns and suggested next actions
`failure-review.json`	Machine-consumable failure triage summary
`summary.json`	Machine-consumable metrics and page metadata
`config.json`	Runtime configuration and rule settings used for the run

Example output from a real run against iOS Settings (181 pages, max depth 5):

tree.txt — full page hierarchy
report.md — module breakdown and paths
summary.json — metrics and metadata

Local real-device runs are written under output/evidence/explorer/. These outputs are intentionally structured so they can be curated into public showcase evidence or consumed by follow-on tooling such as coverage diffing, PR summaries, and replay path extraction.

For architecture details and rule configuration:

What This Repository Actually Is

This repo contains both:

Executable implementation (MCP server, adapters, contracts, core orchestration), and
Architecture and delivery knowledge base (design principles, capability model, phased rollout docs).

If you only remember one thing: this project is designed as an Explorer-led, governed mobile control layer for AI agents, not a single-framework test runner.

Mobile E2E Harness Positioning

This project is an AI mobile E2E harness: a policy-aware, session-oriented, deterministic-first execution harness for mobile automation where an AI agent needs controlled action, evidence, and support-boundary clarity.

If you're searching for terms like mobile test harness, real-device Android test harness, AI automation harness, or mobile CI harness, this repository is built for that exact workflow.

Why teams use this harness

Deterministic-first harness: stable selectors and structured retries before OCR/CV fallback
Failure-intelligence harness: reason codes, evidence artifacts, and remediation suggestions
Governance-aware harness: policy profiles, auditable sessions, and controlled tool surfaces
Explorer harness: the primary product surface, combining traversal, tool orchestration, risk gating, interruption handling, recovery, and structured coverage/failure evidence (available via CLI)
Real-device evidence: Explorer/probe artifacts plus historical videos for happy path and interruption recovery

Capability Showcase

If you want a quick hands-on tour before diving into architecture details, start here:

Happy path video (login -> scroll -> add to cart -> orders -> cart):
- docs/showcase/videos/m2e-happy-path-scroll-pause-40s.mp4
Visible interruption + recovery video (HOME interruption -> recover_to_known_state -> continue action):
- docs/showcase/videos/m2e-interruption-home-recovery-35s.mp4
Current real-device verification:
- Android Explorer evidence: docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/
- Android probe entrypoint: pnpm run validate:android-tool-probe (latest Vivo V2405A run: 20/23 success, 0 partial, 3 expected diagnostic failures; core UI and interruption-resume paths passed)
- iOS probe entrypoint: pnpm run validate:ios-tool-probe
Governed-control proof:
- pnpm run quickstart:governed-control (first-run readiness and next-command guide)
- pnpm run proof:governed-agent-mobile-control
- pnpm run proof:governed-agent-mobile-control:preflight (checks Android live-proof readiness)
- pnpm run proof:governed-agent-mobile-control:live (requires an Android device/emulator)
- pnpm run proof:governed-business-app-workflow (installs/launches the demo app, then switches to read-only governed agent observation)
- pnpm run proof:governed-policy-escalation (denies under read-only, then retries under interactive policy)
- pnpm run validate:governed-control-evidence
- pnpm run validate:governed-business-app-evidence
- pnpm run validate:governed-business-app-comparison
- pnpm run validate:governed-policy-escalation-evidence
- pnpm run validate:governed-evidence-brief
- pnpm run validate:governed-pr-evidence-summary
- pnpm run verify:mobile-change (one-command mobile change verification UX; use -- --live --contract=configs/readiness/mobile-change.android.json for contract-backed live mode)
- pnpm run generate:mobile-change-readiness-contract / pnpm run validate:mobile-change-readiness-contract
- pnpm run generate:mobile-change-repo-app-success-candidate / pnpm run validate:mobile-change-repo-app-success-candidate (repo-owned demo app success candidate; blocked output is not success evidence until a device/emulator run passes intake)
- pnpm run generate:mobile-change-ci-pr-evidence / pnpm run validate:mobile-change-ci-pr-evidence (compact PR/CI artifact with proof-level-safe blocked/failed/success labels)
- pnpm run generate:mobile-change-failure-memory / pnpm run validate:mobile-change-failure-memory (deterministic failure-pattern grouping and bounded next-action routing)
- pnpm run verify:react-native-change (experimental RN lane that runs readiness plus evidence-pack orchestration; live success still requires device, Metro, debug target, stable selectors, and intake-backed proof)
- pnpm run generate:react-native-readiness / pnpm run validate:react-native-readiness (RN preflight for device, Metro, JS debug target, readiness contract, and stable selectors)
- pnpm run generate:react-native-evidence-pack / pnpm run validate:react-native-evidence-pack (RN review artifact that keeps Metro signals supplemental)
- pnpm run proof:mobile-change-verification (fixture-backed mobile change verification bundle, failure packet, and scenario index)
- pnpm run generate:mobile-change-device-readiness / pnpm run validate:mobile-change-device-readiness (structured device/app/readiness preflight before attempting live mobile change proof)
- pnpm run proof:mobile-change-verification:live (optional live device/emulator proof; use M2E_LIVE_MOBILE_CHANGE_ALLOW_NO_DEVICE=1 for structured no-device output)
- pnpm run proof:mobile-change-verification:live-settings (runnable no-APK Android Settings success lane; requires 10AEA40Z3Y000R5 or editing the device id)
- pnpm run generate:mobile-change-live-settings-lane / pnpm run validate:mobile-change-live-settings-lane
- pnpm run proof:mobile-change-verification:readiness-failure (controlled live-runner-derived app readiness failure packet)
- pnpm run validate:mobile-change-verification
- pnpm run validate:mobile-change-live-android-evidence (tracked Android device 10AEA40Z3Y000R5 live app-readiness failure evidence)
- pnpm run validate:mobile-change-readiness-failure
- pnpm run generate:mobile-change-handoff / pnpm run validate:mobile-change-handoff
- pnpm run intake:mobile-change-live-proof / pnpm run validate:mobile-change-live-proof-intake (review live runner output before promoting it as tracked evidence)
- docs/showcase/governed-agent-mobile-control.md
- docs/showcase/governed-agent-mobile-control-live.md
- docs/showcase/governed-business-app-workflow.md
- docs/showcase/governed-policy-escalation.md
- docs/showcase/governed-quickstart.md
- docs/showcase/evidence/governed-control-brief/brief.md
- docs/showcase/evidence/governed-control-brief/pr-comment.md
- docs/showcase/evidence/governed-control-vivo-2026-05-23/report.md
- docs/showcase/evidence/governed-business-app-vivo-2026-05-24/report.md
- docs/showcase/evidence/governed-business-app-vivo-2026-05-24/comparison.md
- docs/showcase/evidence/governed-policy-escalation-dry-run-2026-05-25/report.md
- docs/showcase/evidence/mobile-change-verification-fixture/report.md
- docs/showcase/evidence/mobile-change-verification-fixture/failure-packet.md
- docs/showcase/evidence/mobile-change-verification-fixture/scenario-index.md
- docs/showcase/evidence/mobile-change-device-readiness/report.md
- docs/showcase/evidence/mobile-change-live-android-10AEA40Z3Y000R5/report.md
- docs/showcase/evidence/mobile-change-live-settings-lane/lane.md
- docs/showcase/evidence/mobile-change-live-proof-intake/intake.md
- docs/showcase/evidence/mobile-change-repo-app-success-candidate/candidate.md
- docs/showcase/evidence/mobile-change-ci-pr-evidence/pr-summary.md
- docs/showcase/evidence/mobile-change-failure-memory/remediation.md
- docs/showcase/evidence/mobile-change-readiness-failure/failure-packet.md
- docs/showcase/evidence/mobile-change-readiness-failure/handoff.md
Historical demo scripts:
- bash scripts/legacy/dev/record-demo-happy-path-android.sh
- bash scripts/legacy/dev/record-demo-interruption-home-recovery-android.sh
- bash scripts/legacy/dev/publish-showcase-assets-android.sh (record + curate videos + refresh snapshots/GIFs)
Demo playbook and evidence index:
AI invocation and task guides:
CI evidence and boundary notes:

Quick GIF Preview

Happy path GIF	Interruption recovery GIF

FAQ

What is a mobile E2E harness for AI agents?

It is an execution layer that lets AI agents run mobile test actions safely and reproducibly. This harness adds session control, policy boundaries, deterministic action routing, and structured evidence beyond basic command execution.

Can this harness run on real Android devices?

Yes. Current real-device evidence is centered on Explorer/probe artifacts, with historical showcase scripts under scripts/legacy/dev/* and recordings under docs/showcase/*. The latest Android Vivo probe verified the core Settings UI action path and resume_interrupted_action with native_android; a few diagnostic/negative-path checks remain intentionally non-green unless their prerequisites, such as Metro, are present.

How does interruption recovery work in this harness?

It detects interruption signals, classifies likely interruption type, and applies bounded recovery actions (for example recover_to_known_state) before continuing the flow.

Is this a replacement for Appium or Maestro?

Not necessarily. It is better understood as an orchestration harness that can coexist with existing execution ecosystems while adding AI-oriented governance and diagnostics.

Which scenarios are the best fit?

AI agents that need safe mobile device access, release-gate mobile regression, flaky-flow triage, AI-driven exploratory checks, and real-device CI workflows that require auditable, evidence-rich outcomes.

What is the Explorer and when should I use it?

Explorer automatically traverses your app's screens without predefined flows. Use it when you need broad coverage discovery, want to map an unfamiliar app's navigation structure, or need to identify all reachable screens before writing targeted test flows. It is available via the explore CLI command.

Android physical-device Explorer evidence is tracked under docs/showcase/evidence/android-explorer-full-2026-04-28T03-38-20/: a full Settings traversal completed in 33m 50s with 45 pages, max depth 4, and 0 failures.

Validate that evidence offline with pnpm run validate:explorer-android-evidence -- --min-pages 45 --min-depth 4.

Appium / Maestro vs This Harness

Dimension	Appium / Maestro	Mobile E2E MCP Harness
Core role	Automation framework / flow runner	AI-facing orchestration harness
Execution strategy	Action execution centric	Deterministic-first + policy/session governance
Failure handling	Assertion/command failure outputs	Structured diagnostics + ranked causes + remediation hints
AI integration	Possible but not primary abstraction	Primary design target (tools for AI agents)
Evidence model	Varies by setup	Built-in evidence-first action outcomes
Helper app dependency	Required for iOS/Android replay	Android: owned-adb primary (no helper app needed for common commands); iOS simulator: axe CLI; iOS physical: WDA (one-time setup, see External Tools Guide)

Official AI Mobile Tools vs This Harness

Android CLI/Journeys, Android Studio Journeys, and the Dart/Flutter MCP server are complementary upstream tools, not replacements for this harness. Treat them as source-native journey execution, authoring, or framework-context providers; feed their outputs into mobile-e2e evidence intake only when proof boundaries are explicit.

The machine-readable bridge contract is generated by:

pnpm run validate:official-tool-bridge

See docs/showcase/evidence/official-tool-bridge/bridge.md for the current relationship matrix.

How It Works (End-to-End)

Typical runtime path:

Agent/client invokes an MCP tool via stdio or dev CLI.
MCP server validates input and applies policy checks.
Session context is resolved (or created), with lease/scheduling guardrails.
Adapter router selects deterministic execution path first.
Action executes and returns a structured result envelope.
Artifacts/evidence (screens, logs, summaries) are attached for audit/debug.
If deterministic resolution fails and policy allows it, bounded OCR/CV fallback is attempted.

This is why the project emphasizes session + policy + evidence, not only UI actions.

High-Level Architecture

Reference split:

Control plane: tool contracts, policy checks, session orchestration, audit/evidence indexing
Execution plane: platform actions, UI resolution, retries, interruption handling, visual fallback

Architecture reference:

Source-of-truth note:

Architecture docs describe both current baseline and target-state design.
If a doc statement conflicts with strict validation behavior, prefer packages/contracts/*.schema.json and configs/policies/*.yaml for current enforced behavior.

Capability Map (Current Scope)

Environment & device control — discovery, lease/isolation, environment shaping
App lifecycle — install/launch/terminate/reset/deep-link entry
Perception & interaction — inspect/query UI, tap/type/wait, flow execution
Diagnostics & evidence — logs, crash signals, performance, screenshot/timeline artifacts
Reliability & remediation — reason-coded failures, bounded retries, remediation helpers

Tool registry/signature dispatch live in packages/mcp-server/src/server.ts, while descriptor metadata and wrapper composition live in packages/mcp-server/src/index.ts.

Complete MCP Tool Catalog (Current)

The server currently exposes 66 tools. For AI agents, this is the current tool surface.

1) Session & lifecycle

start_session, request_manual_handoff, end_session, run_flow, reset_app_state

2) Task orchestration & flow capture

execute_intent, complete_task, start_record_session, get_record_session_status, end_record_session, cancel_record_session, export_session_flow, record_task_flow, validate_flow

3) Device & app control

list_devices, install_app, launch_app, terminate_app, describe_capabilities, doctor

4) UI perception, targeting, and interaction

inspect_ui, query_ui, resolve_ui_target, scroll_only, scroll_and_resolve_ui_target, wait_for_ui, wait_for_ui_stable, tap, tap_element, scroll_and_tap_element, type_text, type_into_element, navigate_back

5) Evidence, observability, and diagnostics

take_screenshot, record_screen, get_logs, get_crash_signals, collect_diagnostics, collect_debug_evidence, get_screen_summary, get_session_state, get_page_context, capture_js_console_logs, capture_js_network_events, list_js_debug_targets, capture_element_screenshot, compare_visual_baseline

6) Interruption handling

detect_interruption, classify_interruption, resolve_interruption, resume_interrupted_action

7) Failure analysis, recovery, and remediation

perform_action_with_evidence, get_action_outcome, explain_last_failure, rank_failure_candidates, find_similar_failures, compare_against_baseline, recover_to_known_state, replay_last_stable_path, suggest_known_remediation, replay_checkpoint_chain

8) Performance profiling

measure_android_performance, measure_ios_performance

9) Network diagnostics

probe_network_readiness, diagnose_network_failure, inspect_network_policy

probe_network_readiness checks runtime connectivity, DNS, latency, and optional backend reachability. diagnose_network_failure starts from an observed failed request and attributes likely Android cleartext or iOS ATS release-policy blockers. inspect_network_policy remains the lower-level static checker for plain HTTP endpoints using decoded manifest, network-security-config, Info.plist, or readable APK/IPA ZIP artifact evidence. These tools do not proxy traffic or mutate app configuration.

For exact signatures and supported inputs/outputs, use packages/mcp-server/src/server.ts (the tool registry source of truth).

Deterministic Ladder and Fallback Policy

Action resolution order is intentional and strict:

Stable ID/resource-id/testID/accessibility identifier
Semantic tree match (text/label/role)
OCR text-region fallback (bounded)
CV/template fallback (bounded)
Fail with reason code + artifacts

Prohibited behavior:

OCR/CV as the default first path
Unbounded retries without state-change evidence
Silent downgrade from deterministic to probabilistic execution

Repository-Wide Principles

Deterministic-first: use stable IDs/tree/native capabilities first; OCR/CV is bounded fallback.
Structured tool contracts: return machine-consumable result envelopes (status, reasonCode, artifacts).
Session-oriented execution: actions run in auditable sessions with explicit policy profiles.
Evidence-rich failures: failures should carry enough context for explain/replay/remediation.

Session, Policy, and Governance Model

Sessions are auditable execution units with timeline and artifact references.
Policy profiles can restrict tool classes (for example read-only vs interactive/full-control).
Lease/scheduler constraints prevent unsafe concurrent execution on the same target.
Redaction/governance paths exist to keep evidence useful while respecting data boundaries.

Key policy/config locations:

configs/policies/*.yaml
configs/profiles/*.yaml
Policy profile guide

Non-Goals (Important for Correct Expectations)

This is not a replacement for every mobile framework internals.
This is not OCR-first automation.
This does not imply separate full RN or Flutter backends, or immediate parity across all native/RN/Flutter edge cases.
This is not a single abstraction that erases all platform differences.

Selected Docs

README.zh-CN.md — Chinese overview
docs/README.md — public documentation index and publication policy
docs/guides/ai-agent-invocation.zh-CN.md — canonical AI-agent invocation guide
docs/guides/policy-profiles.md — policy profile usage and escalation boundaries
docs/engineering/ai-first-capability-expansion-guideline.md — feature expansion rules for AI-first harness capabilities
docs/architecture/overview.md — goals/scope/principles
docs/architecture/architecture.md — reference architecture
docs/architecture/capability-map.md — capability taxonomy/maturity
docs/architecture/governance-security.md — governance/security model
docs/architecture/README.zh-CN.md — architecture navigation index (zh-CN)
docs/architecture/session-orchestration-architecture.zh-CN.md — session lease/scheduler/runtime orchestration
docs/architecture/policy-engine-runtime-architecture.zh-CN.md — policy runtime/guard/scope mapping
docs/architecture/platform-implementation-matrix.zh-CN.md — cross-platform support matrix
docs/delivery/roadmap.md — delivery phases
docs/delivery/npm-release-and-git-tagging.zh-CN.md — npm release and Git tagging integration guide (includes layered doc-sync rules for PR/pre-tag/tag stages)
docs/showcase/README.md — real-device demo evidence and repro scripts
tests/README.md — test layers and CI scope

Roadmap Snapshot (Short)

Near term: harden deterministic session/action reliability and evidence model.
Mid term: broaden framework/profile maturity and real-run coverage.
Long term: stronger agentic remediation/governance and enterprise controls.

Detailed public planning references are maintained in docs/delivery/roadmap.md and docs/architecture/*.

Open Source Collaboration

License: MIT
Contributing guide: CONTRIBUTING.md
Changelog: CHANGELOG.md

Positioning

This project is not another isolated test framework. It is an AI-facing orchestration layer that routes mobile E2E actions through shared platform adapters and framework profiles, with deterministic-first behavior and strict governance boundaries.

Support This Project

If this project helps your team, you can support it by:

Starring and sharing the repository
Opening issues/PRs with reproducible evidence
Sponsoring the project

Donation note:

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured