webpilot

webpilot

A semantic terminal browser that renders web pages as structured numbered elements for LLM agents via MCP, enabling agents to navigate, interact, and extract data from any website.

Category
Visit Server

README

Webpilot

The web, through the eyes of a machine.

npm version License: MIT

A semantic terminal browser that renders web pages as structured, numbered, interactive text. Built for LLM agents (via MCP), CLI-native developers, and automation pipelines.

Key insight: LLMs don't need to see a website — they need to understand it. Webpilot uses the accessibility tree (the same structure screen readers use) to represent any website as numbered elements that both humans and machines can interact with.

$ wpilot https://github.com

┌──────────────────────────────────────────────────────────────┐
  GitHub: Let's build from here
  https://github.com
└──────────────────────────────────────────────────────────────┘

  [1] › link         Sign in
  [2] › link         Sign up
  [3] ⌕ searchbox    Search GitHub
  [4] # h1           Build and ship software on a single platform
  [5] _ textbox      Enter your email address
  [6] ◆ button       ⟦ Sign up for GitHub ⟧

~ › click 1

  Navigated: github.com -> github.com/login

  [1] # h1           Sign in to GitHub
  [2] _ textbox      Username or email address
  [3] _ textbox      Password
  [4] ◆ button       ⟦ Sign in ⟧
  [5] › link         Forgot password?
  [6] › link         Create an account

~ › type 2 octocat@github.com
~ › type 3 ••••••••

~ › ss github-login.png

  Screenshot saved to github-login.png

The screenshot webpilot saves:

<img src="assets/github-example.png" alt="Webpilot screenshot of GitHub login with filled credentials" width="700" />

Why Webpilot?

Browsh Lynx Carbonyl Webpilot
JavaScript support Yes No Yes Yes
SPAs (React, Next.js, Vue) Yes No Yes Yes
LLM-parseable output No No No Yes
MCP server for AI agents No No No Yes
Element interaction by ID No No No Yes
State diffs No No No Yes
Semantic (a11y tree) No Partial No Yes
No pixels, no rendering No Yes No Yes
Zero cost (runs locally) Yes Yes Yes Yes

Install

npm install -g webpilot-cli
npx playwright install chromium   # one-time browser setup

Quick Start

# Interactive REPL
wpilot https://google.com

# JSON output for LLM agents
wpilot --agent https://google.com

# Pipe mode for scripting
echo 'goto https://example.com
extract --links' | wpilot --pipe

# Shorthand URLs
wpilot :3000              # → http://localhost:3000
wpilot google.com         # → https://google.com

MCP Server (for Claude, ChatGPT, etc.)

Webpilot ships as an MCP server — any LLM agent that supports the Model Context Protocol can browse the web through it.

Setup with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "webpilot": {
      "command": "npx",
      "args": ["-y", "webpilot-cli", "--mcp"]
    }
  }
}

Setup with VS Code / Copilot

Add to your .vscode/mcp.json:

{
  "servers": {
    "webpilot": {
      "command": "npx",
      "args": ["-y", "webpilot-cli", "--mcp"]
    }
  }
}

Available MCP Tools

Tool Description
web_navigate Open a URL and get page state
web_snapshot Get current page as numbered elements
web_click Click element by [n] ID
web_type Type into input/textarea by [n] ID
web_select Select dropdown option
web_scroll Scroll up/down/top/bottom
web_back Go back in history
web_extract Extract text, links, tables, forms, or metadata
web_eval Execute JavaScript in page context
web_screenshot Capture page as PNG image
web_tabs List open tabs
web_newtab Open new tab
web_close Close browser session

Example Agent Interaction

Agent: web_navigate("https://news.ycombinator.com")
→ 280 elements: [1] link "Hacker News", [2] link "new", [3] link "past" ...

Agent: web_click(5)
→ Navigated to article page, 42 elements

Agent: web_extract({ type: "text" })
→ Full article text extracted

Agent: web_back()
→ Back to Hacker News front page

Commands Reference

Command Description
goto <url> Navigate to URL
click [n] Click element by ID
type [n] "text" Type into form field
select [n] "option" Select dropdown option
check [n] / hover [n] Toggle checkbox / hover
press <key> Press keyboard key (Enter, Tab, etc.)
back / forward Browser history
refresh Reload current page
scroll down|up|top|bottom Scroll the page
find "text" Search for text in elements
show Re-display current page state
extract --text|--links|--tables|--forms|--meta Extract structured content
eval "js" Execute JavaScript
screenshot [path] Save screenshot
source View page HTML
tabs / tab [n] / newtab / closetab Tab management
help Show all commands

Three Output Modes

  • Human (default) — Colored, formatted for terminal reading
  • Agent (--agent) — JSON structured output for LLM consumption
  • Pipe (auto-detected) — Plain text for grep, awk, scripting

How It Works

Website → Playwright (headless Chromium) → CDP Accessibility Tree → Numbered Elements → You
  1. Playwright launches headless Chromium — full JS, cookies, SPAs, everything works
  2. The accessibility tree is extracted via Chrome DevTools Protocol — semantic structure, not pixels
  3. Elements get numbered IDs[1], [2], [3]... for easy targeting
  4. After each action, a state diff shows what changed, not the entire page
  5. You interact with simple commands: click [3], type [5] "hello"

Works With Everything

  • localhost:3000 — your dev server
  • Public websites — Google, GitHub, HN, anything
  • React / Next.js / Vue / Angular / Svelte — full JS execution
  • SPAs with client-side routing
  • Sites behind login — cookies persist in session
  • Dynamic content — JS runs before each snapshot
  • Forms, dropdowns, checkboxes — full interaction
  • Multi-tab browsing

Use Cases

  • LLM agents browsing the web — Claude/ChatGPT navigate, fill forms, extract data via MCP
  • E2E testing in CI — pipe commands, assert output, no flaky selectors
  • Web scraping — extract links, tables, text from any JS-rendered page
  • Accessibility auditing — see exactly what the a11y tree exposes
  • SSH/headless environments — browse from any terminal, no GUI needed

Development

git clone https://github.com/luckysolanki902/webpilot.git
cd webpilot
npm install
npx playwright install chromium
npm run build    # → dist/index.js
npm run dev      # Watch mode

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured