ios-agent-driver
An MCP server that lets an AI agent drive the iOS Simulator in a loop, enabling tapping, typing, swiping, reading the screen via the accessibility tree, and verifying app state.
README
ios-agent-driver
An MCP server that lets an AI agent drive the iOS Simulator in a loop — so an agent can actually use your app: tap, type, swipe, read the screen, and verify what happened.
It bridges the gap between iOS development and agentic testing. The primitives to
control a simulator exist (xcrun simctl, Meta's idb), but
nothing packages them into tools an agent can call to close the
perceive → decide → act → observe loop. This does.
- Accessibility-tree-first perception. The agent reasons over labeled UI
elements (
describe_ui) and taps by label, not by guessing pixel coordinates — far more robust to layout changes. - Screenshot fallback. For custom-drawn views that don't expose
accessibility,
screenshotgives a vision fallback and a way to verify state. - Loud failures. A tap on a missing label returns the nearest labels on screen, not a silent no-op.
How it works
Agent (Claude / any MCP client)
goal: "log a leg workout, confirm it appears in History"
observe → decide → act → observe (loop)
│ MCP (stdio)
ios-agent-driver
│ │
xcrun simctl idb (+ companion)
lifecycle, screenshots accessibility tree,
deeplinks, permissions tap / type / swipe by element
Requirements
- macOS with Xcode (provides
xcrun simctl) - idb for UI perception and actions:
If the companion build errors with “Command Line Tools are too outdated”, update them (System Settings › Software Update, orbrew tap facebook/fb && brew trust facebook/fb brew install facebook/fb/idb-companion # source build — needs current Xcode Command Line Tools pip3 install fb-idb # the `idb` CLI; use pipx/venv if pip is externally-managed idb list-targets # confirm it sees your booted simxcode-select --install). Lifecycle tools work without idb;describe_ui/tap/type_text/swiperequire it and will tell you how to install it if it's missing. - Node.js ≥ 18
Install
git clone https://github.com/CodeJonesW/ios-agent-driver.git
cd ios-agent-driver
npm install # builds via the prepare script
Register with Claude Code
Add to your MCP config (user-level ~/.claude.json, or a project .mcp.json):
{
"mcpServers": {
"ios-agent-driver": {
"command": "node",
"args": ["/absolute/path/to/ios-agent-driver/dist/server.js"]
}
}
}
Or with the Claude Code CLI:
claude mcp add ios-agent-driver -- node /absolute/path/to/ios-agent-driver/dist/server.js
Tools
| Tool | Backend | Purpose |
|---|---|---|
list_sims |
simctl | List devices (udid, name, state, runtime). |
boot_sim |
simctl | Boot a sim (defaults to booted, else first iPhone). |
install_app |
simctl | Install a built .app bundle. |
launch |
simctl | Launch an app by bundle id. |
terminate |
simctl | Terminate a running app. |
reset_app |
simctl | Uninstall + reinstall for a clean state. |
deeplink |
simctl | Open a URL / universal link. |
set_permission |
simctl | Grant/revoke/reset a privacy permission. |
describe_ui |
idb | Primary perception — accessibility tree as JSON. |
screenshot |
simctl | PNG of the current screen (vision fallback). |
tap |
idb | Tap by accessibility label (preferred) or x,y. |
type_text |
idb | Type into the focused field. |
swipe |
idb | Swipe/scroll by direction or coordinates. |
press_button |
idb | Hardware buttons (HOME, LOCK, …). |
The loop, by example
A typical agent goal runs as a bounded loop:
GOAL: "open Settings and confirm Notifications is enabled"
1. boot_sim
2. launch { bundle_id: "com.apple.Preferences" }
3. describe_ui → see "Notifications" cell
4. tap { label: "Notifications" }
5. describe_ui → assert the toggle state
(re-read after each action; stop when the goal predicate holds
or a step budget is exhausted)
The agent owns the loop and the success predicate; this server provides the primitives. That keeps the tool simple and the test logic where it belongs.
Development
npm run build # compile TypeScript → dist/
npm start # run the server on stdio
License
MIT © Will Jones (CodeJonesW)
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.