Rod MCP Server

Rod MCP Server

Browser automation for AI agents via the Model Context Protocol, enabling web navigation, form filling, screenshots, and more using Chromium.

Category
Visit Server

README

Rod MCP Server

<div align="center">

<img src="assets/logo2.png" alt="Rod MCP" width="300">

Browser automation for AI agents via the Model Context Protocol.

Built on Rod — a fast, reliable Go library for controlling Chromium browsers.

Release License: MIT Go

</div>


Rod MCP gives AI agents (Claude, Cursor, etc.) full browser control — navigate pages, fill forms, click buttons, take screenshots, generate PDFs, and more. It works in two modes:

  • Text mode (default): Uses accessibility snapshots for structured, token-efficient interaction
  • Vision mode: Uses screenshots with coordinate-based clicking for visual AI models

Quick Start

Install

One-command build from source (no dotfiles required):

git clone https://github.com/aliwatters/rod-mcp.git && cd rod-mcp && ./install.sh

This builds the binary and installs it to ~/.local/bin/rod-mcp. Re-running is a no-op unless HEAD changed; use --force or FORCE_REBUILD=1 to rebuild. Set INSTALL_PREFIX to change the install root.

Or via go install:

go install github.com/aliwatters/rod-mcp@latest

Or download a pre-built binary for your platform.

Configure your MCP client

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "rod-mcp": {
      "command": "rod-mcp",
      "args": ["--headless", "--no-banner", "--compact-snapshot"]
    }
  }
}

Claude Code (~/.claude/settings.json):

{
  "mcpServers": {
    "rod-mcp": {
      "command": "rod-mcp",
      "args": ["--headless", "--no-banner", "--compact-snapshot"]
    }
  }
}

Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "rod-mcp": {
      "command": "rod-mcp",
      "args": ["--headless", "--no-banner", "--compact-snapshot"]
    }
  }
}

That's it. Your AI agent can now browse the web.

Tools

Navigation

Tool Description
rod_navigate Navigate to a URL
rod_go_back Go back in browser history
rod_go_forward Go forward in browser history
rod_reload Reload the current page

Page Interaction (Text Mode)

Tool Description
rod_snapshot Capture accessibility snapshot of the page
rod_click Click an element by ref or accessible name/role
rod_hover Hover over an element by ref or accessible name/role
rod_fill Type text into an input field by ref or accessible name/role
rod_selector Select an option in a dropdown by ref or accessible name/role

Semantic targeting: rod_click, rod_hover, rod_fill, and rod_selector accept name (accessible name, substring match) and role (ARIA role filter) as alternatives to ref. This enables one-step interactions without calling rod_snapshot first — e.g., rod_click(element="Login button", name="Login", role="button").

Page Interaction (Vision Mode)

Tool Description
rod_vision_click Click at x,y coordinates
rod_vision_fill Click at coordinates and type text

Media

Tool Description
rod_screenshot Take a PNG screenshot
rod_pdf Generate a PDF of the page

Browser Control

Tool Description
rod_evaluate Execute JavaScript in the browser
rod_close_browser Close the browser
rod_set_headers Set HTTP headers for requests
rod_resize Set viewport size and device emulation
rod_handle_dialog Handle JavaScript dialogs (alert, confirm, prompt)
rod_configure Change headless mode or CDP endpoint at runtime

Tabs

Tool Description
rod_tab_new Open a new tab
rod_tab_list List all open tabs
rod_tab_select Switch to a tab
rod_tab_close Close a tab

Accessibility

Tool Description
rod_a11y_audit Audit page accessibility — find missing labels, heading order issues, WCAG violations

Debugging

Tool Description
rod_wait_for Wait for a selector or text to appear
rod_console_messages Capture browser console output
rod_network_requests Capture network requests
rod_response_body Get the response body of a captured network request

Input

Tool Description
rod_press Press a keyboard key
rod_scroll Scroll the page or an element
rod_drag Drag and drop elements
rod_file_upload Upload files to a file input

State & Storage

Tool Description
rod_cookies Get, set, or delete cookies
rod_storage Inspect localStorage and sessionStorage
rod_permissions Grant or reset browser permissions

Network

Tool Description
rod_intercept Intercept, mock, block, or fail network requests
rod_websocket List WebSocket connections and inspect frames

Performance

Tool Description
rod_performance Get page performance metrics and Core Web Vitals
rod_coverage Start/stop CSS and JS code coverage collection

Configuration

CLI Flags

--config, -c       Path to config file (default: $XDG_CONFIG_HOME/rod-mcp/rod-mcp.yaml, or ~/.config/rod-mcp/rod-mcp.yaml)
--headless, -hl    Run browser without GUI
--vision, -vs      Enable vision mode (coordinate-based tools)
--compact-snapshot  Reduce snapshot size for fewer tokens
--output-dir       Directory for screenshots and PDFs
--omit-images      Don't include base64 images in responses
--cdp-endpoint     Connect to an existing browser via CDP
--chrome-debug-port  Launch Chrome with remote debugging on this port
--user-data-dir    Clone a Chrome profile directory (inherits cookies/sessions)
--clone-domains    Comma-separated domains to clone cookies for (e.g. "localhost,*.clerk.dev")
--no-clone         Use profile directly instead of cloning (locks your main Chrome)
--clone-all        Clone ENTIRE profile including passwords, history, extensions (slow!)
--no-banner        Suppress the startup banner

Config File

Create a config at $XDG_CONFIG_HOME/rod-mcp/rod-mcp.yaml (or ~/.config/rod-mcp/rod-mcp.yaml; one is generated automatically there on first run):

mode: text                    # text or vision
headless: false               # run without GUI
browserBinPath: ""            # path to Chrome/Chromium (auto-detected)
browserTempDir: ./rod/browser # browser profile directory
noSandbox: false              # disable Chrome sandbox
proxy: ""                     # proxy URL (e.g. socks5://localhost:1080)
compactSnapshot: false        # reduce tokens in snapshots
outputDir: ""                 # screenshot/PDF output (default: OS temp)
imageResponses: allow         # allow or omit inline base64 images
userDataDir: ""               # Chrome profile to clone (e.g. ~/Library/Application Support/Google/Chrome)
cloneDomains:                 # domains to clone cookies for (empty = all cookies)
  - "localhost"
  - "*.clerk.dev"

# Inject HTTP headers globally
extraHTTPHeaders:
  Authorization: "Bearer my-token"

# Inject headers for specific domains (supports wildcards)
domainHeaders:
  "*.example.com":
    X-Custom-Header: "value"

Connecting to an Existing Browser

To control an already-running Chrome instance (useful for authenticated sessions):

Option A — Clone your Chrome profile (cookies, sessions, auth) for specific domains:

# macOS — clone only cookies for your app's domains
rod-mcp --user-data-dir "$HOME/Library/Application Support/Google/Chrome" \
        --clone-domains "localhost,*.clerk.dev,*.stripe.com"

# Linux
rod-mcp --user-data-dir "$HOME/.config/google-chrome" \
        --clone-domains "localhost,*.myapp.com"

# Clone all cookies (no domain filter)
rod-mcp --user-data-dir "$HOME/Library/Application Support/Google/Chrome"

By default, --user-data-dir clones the profile to a temp directory (cleaned up on exit) so your main Chrome stays usable. Cookies are decrypted from Chrome's encrypted database using your macOS Keychain and injected via CDP — no need to quit your main browser.

Note: Cookie decryption currently requires macOS (reads "Chrome Safe Storage" from Keychain) and sqlite3 in PATH. On other platforms, use --no-clone as a workaround.

# Use profile directly without cloning (Chrome must not be running)
rod-mcp --user-data-dir "..." --no-clone

# ⚠️  Clone EVERYTHING — passwords, history, extensions, all browser data
# This is slow for large profiles and copies sensitive data to a temp directory
rod-mcp --user-data-dir "..." --clone-all

Option B — Let rod-mcp launch Chrome with debugging enabled:

rod-mcp --chrome-debug-port 9222

Option C — Launch Chrome yourself, then connect:

  1. Launch Chrome with remote debugging:

    google-chrome --remote-debugging-port=9222
    
  2. Connect rod-mcp:

    rod-mcp --cdp-endpoint http://127.0.0.1:9222
    

    Or use rod_configure at runtime to switch to a CDP endpoint.

Docker

docker build -t rod-mcp .
docker run -i --rm rod-mcp

The container runs headless with Chromium. Mount a custom config:

docker run -i --rm -v ./rod-mcp.yaml:/app/rod-mcp.yaml:ro rod-mcp

Or use Docker Compose:

docker compose up --build

Building from Source

git clone https://github.com/aliwatters/rod-mcp.git
cd rod-mcp
go build -o rod-mcp .

Prerequisites

  • Go 1.23+
  • Chrome or Chromium

Project Structure

rod-mcp/
├── main.go          # Entry point
├── cmd.go           # CLI flags and commands
├── server.go        # MCP server setup and tool registration
├── runner.go        # Server lifecycle
├── tools/           # All MCP tool implementations
│   ├── browser.go   #   evaluate, close, headers, resize, dialog
│   ├── configure.go #   runtime reconfiguration
│   ├── debug.go     #   wait_for, console, network
│   ├── input.go     #   keyboard, file upload
│   ├── media.go     #   screenshot, PDF
│   ├── navigation.go#   navigate, back, forward, reload
│   ├── snapshot.go  #   text mode: snapshot, click, hover, fill, selector
│   ├── tabs.go      #   tab management
│   └── vision.go    #   vision mode: coordinate click/fill
├── types/           # Config, context, snapshot, logging
├── utils/           # Shared utilities
├── banner/          # Startup banner
└── assets/          # Logo images

License

MIT - see LICENSE

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured