Pilot
A high-performance browser automation MCP server that provides AI agents with a fast, persistent Chromium instance via Playwright. It features reference-based element interaction, snapshot diffing, and manual handoff capabilities to handle complex tasks like CAPTCHAs.
README
pilot
Browser automation for AI agents. 20x faster than the alternatives.
pilot is an MCP server that gives your AI agent a fast, persistent browser. Built on Playwright, it runs Chromium in-process over stdio — no HTTP server, no cold starts, no per-action overhead.
LLM Client → stdio (MCP) → pilot → Playwright → Chromium
in-process persistent
First call: ~3s (launch)
Every call after: ~5-50ms
Why pilot?
| pilot | @playwright/mcp | BrowserMCP | |
|---|---|---|---|
| Latency/action | ~5-50ms | ~100-200ms | ~150-300ms |
| Architecture | In-process stdio | Separate process | Chrome extension |
| Persistent browser | Yes | Per-session | Yes |
| Tools | 51 (configurable profiles) | 25+ | ~20 |
| Token control | max_elements, structure_only, interactive_only |
No | No |
| Iframe support | Full (list, switch, snapshot inside) | NOT_PLANNED | No |
| Cookie import | Chrome, Arc, Brave, Edge, Comet | No | No |
| Snapshot diffing | Track page changes between actions | No | No |
| Handoff/Resume | Open headed Chrome, interact manually, resume | No | No |
Speed matters when your agent makes hundreds of browser calls in a session. At 100 actions, that's 5 seconds with pilot vs 20 seconds with alternatives.
Quick Start
npx pilot-mcp
npx playwright install chromium
Add to your Claude Code config (.mcp.json):
{
"mcpServers": {
"pilot": {
"command": "npx",
"args": ["-y", "pilot-mcp"]
}
}
}
For Cursor, add the same config to your Cursor MCP settings.
That's it. Your AI agent now has a browser.
How It Works
Snapshot once, interact by ref. No CSS selectors needed.
pilot_snapshot → @e1 [button] "Submit", @e2 [textbox] "Email", ...
pilot_fill → { ref: "@e2", value: "user@example.com" }
pilot_click → { ref: "@e1" }
The ref system gives LLMs a simple, reliable way to interact with pages. Stale refs are auto-detected with clear error messages.
Token Control
Large pages can blow up your context window. Pilot gives you fine-grained control:
pilot_snapshot({ max_elements: 20 })
→ Returns 20 elements + "614 more elements not shown"
pilot_snapshot({ structure_only: true })
→ Pure tree structure, no text content
pilot_snapshot({ interactive_only: true, max_elements: 15 })
→ Only buttons/links/inputs, capped at 15
Combine max_elements, structure_only, interactive_only, compact, and depth to get exactly the level of detail you need. Start small, expand as needed.
Tool Profiles
48+ tools can overwhelm LLMs (research shows degradation at 30+ tools). Use PILOT_PROFILE to load only what you need:
| Profile | Tools | Use case |
|---|---|---|
core |
9 | Simple automation — navigate, snapshot, click, fill, type, press_key, wait, screenshot |
standard |
25 | Common workflows — core + tabs, scroll, hover, drag, iframe, page reading |
full |
51 | Everything |
{
"mcpServers": {
"pilot": {
"command": "npx",
"args": ["-y", "pilot-mcp"],
"env": { "PILOT_PROFILE": "full" }
}
}
}
The default profile is standard (25 tools). Set PILOT_PROFILE=full for all 51 tools.
Security & Configuration
| Variable | Default | Description |
|---|---|---|
PILOT_PROFILE |
standard |
Tool set: core (9), standard (25), or full (51) |
PILOT_OUTPUT_DIR |
System temp | Restricts where screenshots/PDFs can be written |
Security hardening:
- Output path validation prevents writing outside
PILOT_OUTPUT_DIR - Path traversal protection on all file-write operations
- Expression size limit (50KB) on
pilot_evaluateinput - File upload resolves symlinks to prevent directory escape
Tools (51)
Navigation
| Tool | Description |
|---|---|
pilot_navigate |
Navigate to a URL |
pilot_back |
Go back in browser history |
pilot_forward |
Go forward in browser history |
pilot_reload |
Reload the current page |
Snapshots
| Tool | Description |
|---|---|
pilot_snapshot |
Accessibility tree with @eN refs. Supports max_elements, structure_only, interactive_only, compact, depth. |
pilot_snapshot_diff |
Unified diff showing what changed since last snapshot |
pilot_annotated_screenshot |
Screenshot with red overlay boxes at each @ref position |
Interaction
| Tool | Description |
|---|---|
pilot_click |
Click by @ref or CSS selector (auto-routes <option> to selectOption) |
pilot_hover |
Hover over an element |
pilot_fill |
Clear and fill an input/textarea |
pilot_select_option |
Select a dropdown option by value, label, or text |
pilot_type |
Type text character by character |
pilot_press_key |
Press keyboard keys (Enter, Tab, Escape, etc.) |
pilot_drag |
Drag from one element to another |
pilot_scroll |
Scroll element into view or scroll page |
pilot_wait |
Wait for element visibility, network idle, or page load |
pilot_file_upload |
Upload files to a file input |
Iframes
| Tool | Description |
|---|---|
pilot_frames |
List all frames (iframes) on the page |
pilot_frame_select |
Switch context into an iframe by index or name |
pilot_frame_reset |
Switch back to the main frame |
After switching frames, pilot_snapshot, pilot_click, pilot_fill, and all interaction tools operate inside that iframe. Use pilot_frames to discover available iframes, then pilot_frame_select to enter one.
Page Inspection
| Tool | Description |
|---|---|
pilot_page_text |
Clean text extraction (strips script/style/svg) |
pilot_page_html |
Get innerHTML of element or full page |
pilot_page_links |
All links as text + href pairs |
pilot_page_forms |
All form fields as structured JSON |
pilot_page_attrs |
All attributes of an element |
pilot_page_css |
Computed CSS property value |
pilot_element_state |
Check visible/hidden/enabled/disabled/checked/focused |
pilot_page_diff |
Text diff between two URLs (staging vs production, etc.) |
Debugging
| Tool | Description |
|---|---|
pilot_console |
Console messages from circular buffer |
pilot_network |
Network requests from circular buffer |
pilot_dialog |
Captured alert/confirm/prompt messages |
pilot_evaluate |
Run JavaScript on the page (supports await) |
pilot_cookies |
Get all cookies as JSON |
pilot_storage |
Get localStorage/sessionStorage (sensitive values auto-redacted) |
pilot_perf |
Page load performance timings (DNS, TTFB, DOM parse, load) |
Visual
| Tool | Description |
|---|---|
pilot_screenshot |
Screenshot of page or specific element |
pilot_pdf |
Save page as PDF |
pilot_responsive |
Screenshots at mobile (375), tablet (768), and desktop (1280) |
Tabs
| Tool | Description |
|---|---|
pilot_tabs |
List open tabs |
pilot_tab_new |
Open a new tab |
pilot_tab_close |
Close a tab |
pilot_tab_select |
Switch to a tab |
Settings & Session
| Tool | Description |
|---|---|
pilot_resize |
Set viewport size |
pilot_set_cookie |
Set a cookie |
pilot_import_cookies |
Import cookies from Chrome, Arc, Brave, Edge, Comet |
pilot_set_header |
Set custom request headers (sensitive values auto-redacted) |
pilot_set_useragent |
Set user agent string |
pilot_handle_dialog |
Configure dialog auto-accept/dismiss |
pilot_handoff |
Open headed Chrome with full state for manual interaction |
pilot_resume |
Resume automation after manual handoff |
pilot_close |
Close browser and clean up |
Key Features
Cookie Import
Import cookies from your real browser into the headless session. Decrypts from the browser's SQLite cookie database using platform-specific safe storage keys (macOS Keychain).
pilot_import_cookies({ browser: "chrome", domains: [".github.com"] })
Supports Chrome, Arc, Brave, Edge, and Comet. Use list_browsers, list_profiles, and list_domains to discover what's available.
Handoff / Resume
When headless mode hits a CAPTCHA, bot detection, or complex auth flow:
- Call
pilot_handoff— opens a visible Chrome window with all your cookies, tabs, and localStorage - Solve the challenge manually
- Call
pilot_resume— automation continues with the updated state
Snapshot Diffing
Call pilot_snapshot_diff after an action to see exactly what changed on the page. Returns a unified diff. Useful for verifying actions worked, monitoring dynamic content, or debugging.
AI-Friendly Errors
Playwright errors are translated into actionable guidance:
- Timeout → "Element not found. Run pilot_snapshot for fresh refs."
- Multiple matches → "Selector matched multiple elements. Use @refs from pilot_snapshot."
- Stale ref → "Ref is stale. Run pilot_snapshot for fresh refs."
Circular Buffers
Console, network, and dialog events are captured in O(1) ring buffers (50K capacity). Query with pilot_console, pilot_network, pilot_dialog. Never grows unbounded.
Architecture
pilot runs Playwright in the same process as the MCP server. No HTTP layer, no subprocess — direct function calls to the Playwright API over a persistent Chromium instance.
┌─────────────────────────────────────────────────┐
│ Your AI Agent (Claude Code, Cursor, etc.) │
│ │
│ ┌──────────────┐ stdio ┌─────────────┐ │
│ │ MCP Client │◄───────────►│ pilot │ │
│ └──────────────┘ │ │ │
│ │ Playwright │ │
│ │ (in-proc) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Chromium │ │
│ │ (persistent)│ │
│ └─────────────┘ │
└─────────────────────────────────────────────────┘
This is why it's fast. No network hops, no serialization overhead, no process spawning per action.
Requirements
- Node.js >= 18
- Chromium (installed via
npx playwright install chromium)
Development
21 unit tests via vitest:
npm test
Credits
The core browser automation architecture — ref-based element selection, snapshot diffing, cursor-interactive scanning, annotated screenshots, circular buffers, and AI-friendly error translation — is ported from gstack by Garry Tan.
Built on Playwright by Microsoft and the Model Context Protocol SDK by Anthropic.
License
MIT
If pilot is useful to you, star the repo — it helps others find it.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.