podium-mcp
One MCP endpoint, 43 tools for iOS-simulator E2E — native UI automation, Maestro flows, trustworthy oracle-ladder assertions, WebView DOM + network (JSON/HAR), and React Native debugging (Metro logs/network/state). macOS + Xcode.
README
<div align="center">
podium-mcp
One baton. Every instrument.
A single MCP stdio endpoint with 43 tools for iOS-simulator control, native UI automation, end-to-end flows, trustworthy assertions, React Native debugging, and WebView DOM + network inspection — one connection instead of half a dozen servers.
<br/>
<img src="assets/demo.gif" alt="podium-mcp agent session — one prompt opens Safari on a live iOS simulator, types github.com/hoainho, explores the profile and opens a repository" width="300" />
<sub><i>One prompt → podium drives Safari live → types the URL → explores the profile → opens a repo. Footage captured on a live iPhone 16 Pro simulator.</i></sub>
</div>
A podium is where a maestro stands — one place to conduct the whole orchestra. This MCP server unifies six capability sets behind a single stdio endpoint:
- Device & app management —
simctl, with gracefuladbdetection. - Native UI inspection & gestures — route through
idb/mobilecliwith a Maestro fallback (no per-gesture JVM spin-up). - End-to-end flows & batch automation — declarative Maestro flows, ordered action batches, and an engineer→QA flow exporter.
- Trustworthy assertions — an oracle ladder (WebView-DOM › native a11y › Maestro) that returns falsifiable, evidenced verdicts and fails closed.
- WebView DOM + network — resolve
WKWebViewDOM to tap coordinates, evaluate JS, drive navigation, and capture in-page HTTP traffic as JSON/HAR. - React Native debugging — Metro console logs, network requests, and in-app state over CDP, plus host/simulator crash reports.
Rather than wiring several MCP servers into every client config, podium-mcp exposes everything behind one connection, with a shared execFile layer (no shell), consistent structured errors, automatic retry around Maestro's iOS-driver flakiness, and a single health-check tool to confirm what's available on the host.
What's new in v0.2.0
- Oracle ladder + trustworthy verdicts —
assert_visible/assert_text/assert_not_visible/wait_for_elementandvalidate_flowverify state through WebView-DOM › native a11y › Maestro, returning evidenced results that fail closed instead of guessing "looks ok". - Batch & export —
run_stepsruns an ordered action batch in one call via the native backend;export_flowturns that batch into a reusable Maestro flow (the engineer→QA bridge). - WebView network capture —
webview_networkrecords in-WebViewfetch/XHRtraffic and exports redacted JSON or HAR 1.2 (the network pathmetro_networkcan't see for WebView-hosted apps). - Deeper RN introspection —
metro_network(CDP Network domain) andmetro_state(read the in-app Redux store) joinmetro_logs. - Native-first gesture backend —
idb/mobileclicuttap_on~14.7 s → ~0.6 s andinspect_screen~8.9 s → ~0.9 s, with a Maestro fallback that preserves app state. - Reliability hardening — explicit per-command timeouts with a
timedOutflag, timestamped recordings + a duration watchdog, native-backend re-probe TTL, exact bundle-id matching, and transparent iOS-simulator scope. - Registry-ready —
server.jsonmanifest + OIDC publish to the official MCP Registry, test-gated before every publish.
Table of contents
- Why
- Requirements
- Install
- Usage
- Quick start
- The 43 tools
- The oracle ladder — trustworthy assertions
- Native-first gesture backend
- WebView & RN network introspection
- Documented limits
- Architecture
- Development & testing
- Releasing
- Prompt playbook & references
- Design ideas
- Contributing · Security · License
Why
Driving a React Native app end-to-end usually means juggling several MCP servers — one for device/app control, one for UI flows, one for Metro/debugger logs, another for WebView inspection — each with its own config entry, quirks, and failure modes. podium-mcp collapses that into one server with:
- a single
execFile-based command runner (no shell — arguments are passed verbatim), - consistent structured errors (a tool never crashes the server),
- automatic retry around Maestro's known iOS-driver flakiness,
- graceful degradation when a toolchain (e.g.
adb) is absent, - evidenced verdicts so an agent knows when a flow actually worked.
Requirements
- macOS with Xcode command-line tools (
xcrun,simctl) - Node.js ≥ 22 (uses native
fetchandWebSocket;.npmrcsetsengine-strict=true) mobilecli— bundled automatically as an npm dependency; the default native gesture + WebView backend (no separate install)- (optional)
idb(idb+idb_companion) — preferred native gesture backend when both are present; auto-detected - (optional) Maestro on
PATH(or at~/.maestro/bin) — therun_flowengine and the gesture fallback path - (optional) a running Metro bundler for the
metro_*debugging tools - (optional) Android SDK +
adb— adb paths are detection-only and degrade gracefully when absent
Platform scope: podium's automation targets the iOS Simulator. Android devices are detected (
device_list,podium_health) but not yet automatable — every adb-backed path returns an informative result instead of failing.
Install
Claude Code plugin (recommended)
No manual config — one-time marketplace setup, then install:
/plugin marketplace add github:hoainho/podium-mcp
/plugin install podium-mcp@podium
The plugin auto-starts the MCP server (all 43 tools) and ships four skills:
| Skill | Invoke | What it does |
|---|---|---|
| Device info | /podium-mcp:device-info <UDID> [<BUNDLE_ID>] |
Health check, screen size, orientation, app list |
| E2E flow | /podium-mcp:e2e <UDID> <BUNDLE_ID> [path or description] |
Run or author a Maestro flow |
| Bug repro | /podium-mcp:bug-repro <UDID> <BUNDLE_ID> <description> |
Video + logs + crash evidence capture |
| RN debug | /podium-mcp:rn-debug [UDID] [logs|apps|crash|all] |
Metro logs, connected apps, crash reports |
npx (zero install)
{
"mcpServers": {
"podium": { "command": "npx", "args": ["-y", "podium-mcp"] }
}
}
Manual (from source)
git clone git@github.com:hoainho/podium-mcp.git
cd podium-mcp
npm install
npm run build
Usage
Register the built server with any MCP client. Claude Code (.mcp.json):
{
"mcpServers": {
"podium": {
"type": "stdio",
"command": "node",
"args": ["/absolute/path/to/podium-mcp/dist/index.js"]
}
}
}
Quick manual smoke test over raw stdio (lists the 43 registered tools):
printf '%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"smoke","version":"0"}}}' \
'{"jsonrpc":"2.0","method":"notifications/initialized"}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/list"}' | node dist/index.js
Always call podium_health first to confirm which toolchain is available on the host.
Quick start (order of use)
podium_health— confirmxcrun/maestro/ native backend availability.device_list— pick a booted simulatorudid.- Read state —
app_list,app_state,screen_size,orientation_get. - Drive the device —
app_launch, thentap_on/input_text/swipe/press_key, plusset_locationandorientation_set. Batch several withrun_steps. - Author & verify —
inspect_screento discover elements,run_flowfor declarative checks, thenassert_visible/validate_flowfor an evidenced verdict. - Inspect WebViews —
webview_inspect→ tap coordinates,webview_eval,webview_navigate,webview_network. - Capture & debug —
screenshot/record_start→record_stop;metro_logs/metro_network/metro_state;crash_list/crash_get.
The 43 tools
Every tool returns structured JSON and never throws — failures come back as MCP tool errors. See
docs/tool-catalog.mdfor the authoritative per-parameter reference.
Health & toolchain (1)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
podium_health |
— | which probes |
Never fails; reports toolchain { xcrun, maestro, adb }, native backend, and platform: ios-simulator |
Device & simulator (6)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
device_list |
— | simctl list -j + adb devices |
Merged iOS inventory; adb absent → android: { available: false } (detection-only) |
device_boot |
udid | simctl boot |
Idempotent — already-booted → alreadyBooted: true; waits up to 30 s |
screen_size |
udid | simctl io screenshot + sips |
{ widthPx, heightPx } (real pixels) |
orientation_get |
udid | native query → screenshot heuristic | { orientation, basis } (exact when native) |
set_location |
udid, latitude, longitude | simctl location set |
Codifies the QA geo-spinner fix |
open_url |
udid, url | simctl openurl |
Deep links + https:// |
Apps (6)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
app_install |
udid, path (.app/.zip) | simctl install |
Structured tool error |
app_launch |
udid, bundleId | simctl launch |
Explicit 30 s timeout (cold RN launches no longer mis-report failure) |
app_terminate |
udid, bundleId | simctl terminate |
Structured tool error |
app_uninstall |
udid, bundleId | simctl uninstall |
Structured tool error |
app_list |
udid | simctl listapps + plutil |
{ count, apps: [{ bundleId, name, type }] } |
app_state |
udid, bundleId | simctl listapps + launchctl |
{ installed, running } — exact bundle-id match |
Capture (3)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
screenshot |
udid, saveTo? | simctl io screenshot |
Returns path + byteSize (no base64 bloat) |
record_start |
udid, saveTo? (.mp4) | detached simctl io recordVideo |
{ ok, path, pid }; timestamped path + duration watchdog (PODIUM_MAX_RECORDING_MS); one per udid |
record_stop |
udid | SIGINT recorder + flush | { ok, path, sizeBytes } |
UI inspection & gestures (8)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
inspect_screen |
udid, compact? | native flat AX list → maestro hierarchy |
compact:true (default) returns only meaningful nodes |
tap_on |
udid, bundleId, text|id|x+y, double?, long? | native tap → Maestro fallback | text/id resolved via the element list; reports backend |
input_text |
udid, bundleId, text, submit? | native → Maestro fallback | reports backend |
swipe |
udid, bundleId, direction, start/end? | native → Maestro fallback | %/pixel overrides resolved vs logical screen size |
press_key |
udid, bundleId, key | native → Maestro fallback | back/power/tab are Android-only |
orientation_set |
udid, bundleId, value | native → Maestro fallback | PORTRAIT / LANDSCAPE_LEFT / LANDSCAPE_RIGHT / UPSIDE_DOWN |
tap_with_fallback |
udid, x, y, maxRetries?, offsetStep? | native tap + before/after oracle | For WebGL/Canvas overlays; no blind walk (offsetStep opt-in) |
notification_bar_clear |
udid, bundleId? | native tap + oracle | Dismisses the RN debug notification bar |
Flows & batch automation (4)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
run_steps |
udid, bundleId, steps[] | native backend (idb/mobilecli) | Ordered action batch in one call; per-step results |
run_flow |
udid + exactly one of yaml/files/dir(+tags), env? | maestro test |
Exactly-one-of validated before exec; per-step pass/fail |
export_flow |
steps[], output path | flow generator | Exports a run_steps batch to a reusable Maestro flow (engineer→QA bridge) |
cheat_sheet |
— | bundled assets/maestro-cheat-sheet.yaml |
Fully offline Maestro syntax reference |
Assertions & verdicts — the oracle ladder (5)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
assert_visible |
udid, text|id, … | oracle ladder (WebView-DOM › a11y › Maestro) | Evidenced pass/fail; reports which oracle proved it |
assert_text |
udid, text | oracle ladder | by-text shorthand for assert_visible |
assert_not_visible |
udid, text|id | oracle ladder | Fails closed — if absence can't be verified, it fails |
wait_for_element |
udid, text|id, timeoutMs? | oracle ladder (polling) | Polls until visible or times out |
validate_flow |
udid, flow + assertions | oracle ladder + flow run | Trustworthy, falsifiable verdict on whether a just-built flow works |
WebView DOM & network (4)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
webview_inspect |
udid, selector?, webviewId?, max? | mobilecli (CDP) |
Resolves a CSS selector to DOM elements with absolute tapX/tapY |
webview_eval |
udid, expression, webviewId? | mobilecli (CDP) |
Runs JS in the page context; gated by PODIUM_DISABLE_WEBVIEW_EVAL=1 |
webview_navigate |
udid, action (goto/back/forward/reload), url? | mobilecli (CDP) |
Drives WebView navigation |
webview_network |
udid, durationMs?, format (json/har)?, saveTo?, redact?, includeResources? | CDP + in-page fetch/XHR shim + Resource Timing | Captures in-WebView HTTP traffic; exports redacted JSON or HAR 1.2 |
React Native debugging — Metro CDP (4)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
metro_apps |
port? (8081) | GET http://localhost:<port>/json |
Differentiated errors (timeout vs not-running vs other) |
metro_logs |
wsUrl?/port?, durationMs?, maxLogs? | WebSocket + CDP Runtime.enable |
Auto-discovers first app when URL omitted |
metro_network |
wsUrl?/port?, durationMs?, maxEntries? | CDP Network.enable |
Requests (url/method/status/mimeType/ts) |
metro_state |
expression?/wsUrl?/port?, timeoutMs? | CDP Runtime.evaluate |
Reads in-app state (default: globally-exposed Redux store) |
Crash diagnostics (2)
| Tool | Key params | Backing engine | Behavior |
|---|---|---|---|
crash_list |
processName?, sinceHours?, udid? | host + sim DiagnosticReports |
Newest-first; tagged source: host | simulator |
crash_get |
id, udid? | same | Path-traversal-safe (basename only); truncates honestly |
The oracle ladder — trustworthy assertions
"It works" is operationalized as a falsifiable, evidenced verdict — never "looks ok". Assertions and validate_flow resolve visibility through a three-rung ladder, using the strongest available signal:
- WebView DOM — when an inspectable
WKWebViewis present, query the real DOM. - Native accessibility — the native AX element set (via
idb/mobilecli). - Maestro —
assertVisible/assertNotVisibleas the fallback.
assert_not_visible fails closed: if absence can't be positively verified (e.g. a WebView is unreadable), it reports failure rather than a false pass. Every verdict names the oracle that produced it, so an agent can weight its confidence.
Native-first gesture backend
Imperative gestures (tap_on, input_text, swipe, press_key, orientation_set, run_steps) and inspect_screen route through the fastest available backend, probed once and cached (with a short negative-cache TTL so a backend that starts after launch is picked up):
idb— when bothidbandidb_companionare installed (native, fastest).mobilecli— the bundled npm dependency (prebuilt Go binary). Default; no install.- Maestro fallback — when no native backend resolves, or for actions it can't express (double/long-press,
UPSIDE_DOWN). The gesture generates a minimal flow withlaunchApp: { stopApp: false }, foregrounding the app without restarting so state is preserved.
Each result reports the backend it used. Set PODIUM_DISABLE_NATIVE=1 to force Maestro. Eliminating the per-gesture JVM spin-up cut tap_on ~14.7 s → ~0.6 s and inspect_screen ~8.9 s → ~0.9 s on an iPhone 16 Pro simulator. Run npm run benchmark for a full pass/fail sweep.
Maestro flakiness retry: when the fallback runs, its iOS driver intermittently fails with Failed to connect to 127.0.0.1:<port>. Flows retry up to 2× with 2 s / 5 s backoff and report the retries count; a persistent failure returns the raw output with remediation hints.
WebView & RN network introspection
Two distinct network layers, two tools:
metro_networkcaptures requests on the RN/Hermes target via the CDP Network domain — the right tool for a native RN app's ownfetch.webview_networkcaptures traffic inside aWKWebView: it injects afetch/XHRrecorder (rich — method/status/headers/body for calls after capture starts) and reads the browser's Performance Resource Timing buffer (includeResources, default on) — every request since navigation, including pre-capture ones (URL/timing/size). The merge yields a near-complete request list, exported as redacted JSON or HAR 1.2.
For an RN shell that hosts its UI in a WebView, the app's API calls run in the web layer — so metro_network sees nothing and webview_network is the tool to reach for. WebView tools require WKWebView.isInspectable = true (default in debug/staging builds; off in production); when none is found they return an actionable error.
Documented limits (by design, not bugs)
- WebGL/Canvas content is un-automatable by selector — no DOM/hierarchy; use
tap_with_fallbackwith screenshot-derived coordinates. - WebView tools are dev/QA only — production App Store builds typically set
isInspectable = false; tools return an actionable error and fall back to coordinate taps. - WebView content-process memory is unreadable from the app sandbox (platform limit) — use indirect signals (memory warnings, process terminations).
- Maestro
text:matcher is full-string regex (IGNORE_CASE) — partial strings don't match; copy hierarchytextverbatim or anchor with.*. - Android is detection-only — every adb path degrades to a structured "adb not found" result.
orientation_getis a screenshot-aspect heuristic when no native backend is present — iOS simulators expose no direct orientation query.record_start/record_stopkeep state in-process — serializestart→ … →stopon one connection; one active recording per udid (a watchdog finalizes one that's never stopped).
Architecture
src/
index.ts # MCP server entry — registers every tool group, warms caches
lib/
exec.ts # execFile-based runner (NO shell) + timeout/timedOut flag
result.ts # shared ok/error MCP content helpers
simctl.ts # xcrun simctl wrappers + device-list TTL cache
native.ts # gesture/inspect backend: idb → mobilecli → null (re-probe TTL)
idb.ts # idb gesture/inspect adapter
gesture.ts # unified native→Maestro executors (shared by screen + steps)
oracle.ts # the oracle ladder: WebView-DOM › a11y › Maestro
maestro.ts # Maestro engine: flow runner, idb retry, hierarchy
export-maestro.ts # run_steps → reusable Maestro flow
har.ts # HAR 1.2 export for webview_network
webview.ts # mobilecli CDP — WebView list/inspect/eval/navigate/network
metro.ts # Metro CDP — app discovery, logs, network, state
crash.ts # DiagnosticReports crash listing/reading
recording.ts # detached screen recording lifecycle + watchdog
tools/ # one file per group:
# health, device, screen, steps, flow,
# assert, validate, webview, debug
assets/ # bundled offline Maestro cheat sheet + demo.gif
scripts/ # benchmark.ts, compare-mcps.ts
e2e/ # real-simulator smoke suites (smoke / full-smoke / webview-network-live)
docs/ # tool catalog, e2e transcript, roadmap
Development & testing
npm run build # tsc
npm run typecheck # tsc --noEmit
npm test # vitest run — 182 unit/integration tests (exec/network mocked, no sim needed)
npm run benchmark # spawn a fresh server over stdio and sweep the tool suite
node e2e/smoke.e2e.mjs # real E2E against a booted simulator (macOS + Xcode)
node e2e/full-smoke.e2e.mjs # drives all 43 tool handlers (happy + structured-error paths)
182 tests across 16 files, all passing — including the oracle ladder (oracle, assert, validate), recording watchdog + timestamps, gesture-parity (screen ≡ steps), HAR export, WebView, and Metro paths.
Standards: TypeScript strict, no as any / @ts-ignore, no shell execution (all commands via lib/exec.ts), tools return structured errors instead of throwing. See CONTRIBUTING.md for the "add a new tool" checklist.
E2E on CI: the E2E (simulator) workflow boots a real iOS simulator on a macOS runner and runs the smoke suites nightly + on demand (not a PR gate — simulator runs are slow). full-smoke.e2e.mjs asserts the happy path where a target exists and the real structured-error path where a dependency is absent (a debug isInspectable app for WebView; a connected RN app for metro_*).
Releasing
server.json is the official MCP Registry manifest. Pushing a v* tag runs
Publish to npm then
Publish to MCP Registry (GitHub OIDC for the
io.github.hoainho/* namespace — no long-lived token). Both workflows run typecheck → build → test
as a gate first; the registry publish only succeeds once the matching npm version is live, and
versions are immutable.
Prompt playbook & references
prompts/— copy-paste prompts for e2e flows, test cases, feature verification, bug fixing, and device control. Each names the podium tools it drives and was validated on a real simulator. Start withprompts/README.md.docs/tool-catalog.md— authoritative tool-by-tool reference.docs/e2e-demo.md— a real transcript against a booted iPhone 16 Pro simulator running a production RN app.
Design ideas
- One podium, one connection. A single server fronts every mobile capability so an agent configures one endpoint and discovers all 43 tools at once.
- Safe by construction. Every external command runs through an
execFilelayer with an explicit argument array — never a shell string. - Never crash the conductor. Tools return structured results and errors instead of throwing; one bad call can't take the server down.
- Degrade, don't fail. A missing toolchain (e.g. Android's
adb) yields an informative result rather than a hard error. - Prove it, don't guess. Assertions return evidenced verdicts via the oracle ladder and fail closed when they can't verify.
Contributing
Contributions welcome — see CONTRIBUTING.md and the Code of Conduct. Use the issue templates for bugs and feature requests.
Security
Please report vulnerabilities privately per SECURITY.md — do not open a public issue.
SECURITY.md also documents the webview_eval / run_flow trust boundary and the PII-in-transcript caveat.
License
MIT © 2026 hoainho
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.