Stealth Browser MCP Server
Provides stealth web browsing using dual browser engines (Chromium and Firefox) with automatic bot-detection bypass, enabling AI agents to browse, interact, and extract content from websites without being blocked.
README
Stealth Browser MCP Server
A Model Context Protocol (MCP) server that provides stealth web browsing capabilities using dual browser engines — Patchright (Chromium) and Camoufox (Firefox) — with automatic bot-detection bypass.
Built for use with Claude Code and other MCP-compatible AI agents.
Features
- Dual Engine Architecture — Patchright (Chromium) as primary engine, Camoufox (Firefox) as fallback with stronger anti-fingerprinting
- Auto Bot-Block Detection — Detects Cloudflare, CAPTCHAs, and other bot protection; automatically retries with Firefox when
engine: auto - Headed Mode via Xvfb — Runs real browser windows (not headless) to beat fingerprint detection
- 18 MCP Tools — Browse, interact, extract, scrape, crawl, structured data extraction, session management, persistent profile state save/load/list/delete, X/Twitter search extraction helpers, heuristic topic research summaries, thread readers, deep topic research, and saved report bundles
- 3-Tier Content Extraction — trafilatura → readability → innertext fallback chain
- SSRF-Hardened — DNS resolution validation blocks localhost, private IPs, cloud metadata,
file:// - Session Pooling — Up to 5 isolated BrowserContext sessions per engine, with 10-minute idle eviction
- Smart Truncation — Large pages truncated at 50K chars on paragraph boundaries
- CAPTCHA Detection — Detects Cloudflare Turnstile, reCAPTCHA, hCaptcha; reports structured
captcha_detectedflag - Auto-Cleanup — Idle sessions evicted after 10 minutes, crashed browser auto-restarts
Tools
browse
Navigate to a URL and return page content as clean markdown.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | URL to navigate to (http/https only) |
session_id |
string | no | Reuse an existing session. If omitted, creates a new one |
wait_for |
string | no | CSS selector to wait for before extracting |
engine |
string | no | auto (default), chromium, or firefox |
Returns: url, title, content, session_id, truncated, captcha_detected, extraction_method, timing_ms, status_code, engine
interact
Interact with the current page in a session.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Session from a previous browse call |
action |
string | yes | One of: click, type, select, hover, scroll |
selector |
string | yes | CSS selector for the target element |
value |
string | no | Required for type and select. For scroll, pixel amount |
Returns: success, session_id, action_performed, page_url, timing_ms
extract
Re-extract content from the current page without re-navigating. Use this instead of browse when you're already on the page.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Session to extract from |
mode |
string | no | auto (default), article, or text (raw innertext) |
Returns: content, session_id, url, extraction_method, truncated
close_session
Close a browser session and free its resources.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Session to close |
Returns: status, session_id
save_session_state
Persist an active session's cookies and local storage to a named profile.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Session to persist |
profile_name |
string | yes | Safe profile name to save under |
Returns: status, session_id, profile_name, storage_state_path, meta
load_session_state
Create a new session from a previously saved profile.
| Parameter | Type | Required | Description |
|---|---|---|---|
profile_name |
string | yes | Saved profile name |
session_id |
string | no | Optional custom session ID |
engine |
string | no | chromium (default) or firefox |
Returns: status, session_id, profile_name, engine, meta
list_saved_profiles
List saved persistent profiles on disk.
Returns: profiles, count
delete_saved_profile
Delete a saved profile from disk.
| Parameter | Type | Required | Description |
|---|---|---|---|
profile_name |
string | yes | Saved profile name |
Returns: status, profile_name
search_x
Open an X search results page for a query and return structured tweet cards.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query |
mode |
string | no | latest (default) or top |
max_items |
int | no | Max tweets to extract (1-50, default 20) |
scroll_rounds |
int | no | Additional scroll/collect rounds (0-10, default 0) |
session_id |
string | no | Reuse an existing session |
profile_name |
string | no | Load a persisted login profile into a fresh session |
engine |
string | no | auto (default), chromium, or firefox |
Returns: query, mode, search_url, session_id, tweets, extracted_count, scroll_rounds_completed, captcha_detected, engine
extract_x_search_results
Extract structured tweet cards from the current page of an existing X search session.
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Active session already on an X search page |
max_items |
int | no | Max tweets to extract (1-50, default 20) |
Returns: session_id, tweets, extracted_count, page_url, page_title
research_x_topic
Run X search and produce a lightweight heuristic topic summary from the extracted tweets.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query |
mode |
string | no | latest (default) or top |
max_items |
int | no | Max tweets to extract (1-50, default 20) |
scroll_rounds |
int | no | Additional scroll/collect rounds (0-10, default 0) |
session_id |
string | no | Reuse an existing session |
profile_name |
string | no | Load a persisted login profile into a fresh session |
engine |
string | no | auto (default), chromium, or firefox |
Returns: everything from search_x plus research, normalized, and report_markdown
read_x_thread
Open a tweet/thread URL and extract the visible main tweet plus replies from the detail page.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | X tweet/thread URL |
max_items |
int | no | Max visible tweets to extract (1-50, default 20) |
session_id |
string | no | Reuse an existing session |
profile_name |
string | no | Load a persisted login profile into a fresh session |
engine |
string | no | auto (default), chromium, or firefox |
Returns: main_tweet, replies, reply_count_extracted, and page metadata
research_x_topic_deep
Run X search, pick a few high-signal tweets, load their thread pages, and produce a richer deep-research summary.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query |
mode |
string | no | latest (default) or top |
max_items |
int | no | Max search tweets to collect |
scroll_rounds |
int | no | Additional search scroll rounds |
deep_dive_count |
int | no | Number of thread URLs to inspect (default 3) |
thread_items |
int | no | Max tweets to extract per thread |
session_id |
string | no | Reuse an existing session |
profile_name |
string | no | Load a persisted login profile into a fresh session |
engine |
string | no | auto (default), chromium, or firefox |
Returns: deep_dive_candidates, threads, deep_research, normalized, and report_markdown in addition to the base search output
save_x_research_report
Run topic research (normal or deep) and save JSON + markdown report bundle to disk.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query |
deep |
bool | no | If true, use deep research workflow |
mode |
string | no | latest (default) or top |
max_items |
int | no | Max tweets to collect |
scroll_rounds |
int | no | Additional search scroll rounds |
deep_dive_count |
int | no | Thread deep-dive count |
thread_items |
int | no | Max tweets per thread |
session_id |
string | no | Reuse an existing session |
profile_name |
string | no | Load a persisted login profile into a fresh session |
engine |
string | no | auto (default), chromium, or firefox |
report_name |
string | no | Optional custom output name |
Returns: research output plus saved_report paths
list_saved_x_reports
List saved research report bundles from disk.
Returns: reports, count
scrape_webpage
Navigate to a URL, extract content in the requested format, and auto-close the session.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | URL to scrape (http/https only) |
output_format |
string | no | markdown (default), text, html, or links |
session_id |
string | no | Reuse session. If omitted, creates ephemeral session that auto-closes |
wait_for |
string | no | CSS selector to wait for before extracting |
engine |
string | no | auto (default), chromium, or firefox |
Returns: url, title, content, session_id, status_code, timing_ms, extraction_method, engine
extract_structured_data
Extract structured DOM data (metadata, links, tables, JSON-LD, etc.) from a webpage.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | URL to extract from (http/https only) |
session_id |
string | no | Reuse session. If omitted, creates ephemeral session |
include |
list | no | Sections to include. Default: all. Options: metadata, og_tags, json_ld, headings, links, tables, forms |
wait_for |
string | no | CSS selector to wait for before extracting |
engine |
string | no | auto (default), chromium, or firefox |
Returns: url, title, session_id, timing_ms, engine, + requested data sections
crawl_pages
Crawl multiple pages via BFS starting from a URL.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Starting URL (http/https only) |
max_pages |
int | no | Maximum pages to crawl (1-20, default 5) |
link_pattern |
string | no | Regex to filter link hrefs |
output_format |
string | no | markdown (default), text, html, or links |
same_domain |
bool | no | Only follow same-domain links (default: true) |
engine |
string | no | auto (default), chromium, or firefox |
Returns: pages (list of {url, title, content, status_code}), total_pages, total_timing_ms, engine
Installation
Prerequisites
System libraries (Ubuntu/Debian/WSL2):
sudo apt-get install -y libnspr4 libnss3 libatk1.0-0 libatk-bridge2.0-0 \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxrandr2 libgbm1 \
libpango-1.0-0 libcairo2 libasound2t64 xvfb
Python 3.12+ and uv (recommended) or pip.
Setup
git clone https://github.com/Axe240-commits/stealth-browser-mcp.git
cd stealth-browser-mcp
chmod +x setup.sh
./setup.sh
Or manually:
uv venv
uv pip install -e ".[dev]"
.venv/bin/python -m patchright install chromium
Verify
# Run tests
.venv/bin/python -m pytest tests/ -v
# Start server (will wait for MCP stdio input)
.venv/bin/python -m stealth_browser
Register with Claude Code
Add to ~/.claude/mcp_servers.json:
{
"stealth-browser": {
"type": "stdio",
"command": "/path/to/stealth-browser-mcp/.venv/bin/python",
"args": ["-m", "stealth_browser"]
}
}
Then add permissions in ~/.claude/settings.json:
{
"permissions": {
"allow": [
"mcp__stealth-browser__browse",
"mcp__stealth-browser__interact",
"mcp__stealth-browser__extract",
"mcp__stealth-browser__close_session",
"mcp__stealth-browser__save_session_state",
"mcp__stealth-browser__load_session_state",
"mcp__stealth-browser__list_saved_profiles",
"mcp__stealth-browser__delete_saved_profile",
"mcp__stealth-browser__search_x",
"mcp__stealth-browser__extract_x_search_results",
"mcp__stealth-browser__research_x_topic",
"mcp__stealth-browser__read_x_thread",
"mcp__stealth-browser__research_x_topic_deep",
"mcp__stealth-browser__save_x_research_report",
"mcp__stealth-browser__list_saved_x_reports",
"mcp__stealth-browser__scrape_webpage",
"mcp__stealth-browser__extract_structured_data",
"mcp__stealth-browser__crawl_pages"
]
}
}
Restart Claude Code. The tools will be available immediately.
Architecture
┌─────────────────────────────────────────────────┐
│ Claude Code / MCP Client │
│ │
│ browse ─ interact ─ extract ─ close_session │
│ scrape_webpage ─ extract_structured_data │
│ crawl_pages │
└────────────────┬────────────────────────────────┘
│ stdio (JSON-RPC)
┌────────────────▼────────────────────────────────┐
│ server.py — FastMCP Server (7 tools) │
│ ├── security.py — SSRF validation (every URL) │
│ ├── session.py — per-session lock + state │
│ ├── browser_manager.py — dual engine pool │
│ ├── extractor.py — 3-tier content extraction │
│ ├── dom_extractor.py — structured DOM data │
│ └── config.py — configuration │
└───────┬─────────────────┬───────────────────────┘
│ │
┌───────▼──────┐ ┌───────▼──────┐
│ Patchright │ │ Camoufox │
│ (Chromium) │ │ (Firefox) │
│ Primary │ │ Fallback │
└───────┬──────┘ └───────┬──────┘
│ │
┌───────▼─────────────────▼───────────────────────┐
│ Xvfb :99 — 1920x1080 (headed mode) │
└─────────────────────────────────────────────────┘
Dual Engine & Auto-Fallback
With engine: auto (the default), every request:
- Tries Patchright (Chromium) first — fast, low overhead
- Checks for bot-block signals: HTTP 403, title keywords ("Just a moment", "Attention Required"), empty content
- If blocked, automatically retries with Camoufox (Firefox) which has stronger anti-fingerprinting
For crawl_pages, the engine switch happens on the first page and sticks for the rest of the crawl.
Content Extraction Pipeline
trafilatura (best for articles, tables, links)
↓ fallback if < 200 chars
readability-lxml + html2text (complex HTML)
↓ fallback if < 200 chars
page.inner_text('body') (SPAs, JS-rendered content)
Session Management
- Two persistent browsers launched at MCP server start (Chromium + Firefox)
- Each
browse()call with nosession_idcreates a newBrowserContext(~100ms) - Sessions are isolated (separate cookies, storage, state)
- Max 5 concurrent sessions, oldest evicted if at capacity
- Idle sessions evicted after 10 minutes
- All operations per session are serialized via
asyncio.Lock - Each session tracks its engine type (
chromiumorfirefox)
Security (SSRF Protection)
Every URL is validated before navigation:
- Scheme check — only
httpandhttpsallowed - DNS resolution — hostname resolved to actual IPs
- IP validation — all resolved IPs checked against private/reserved ranges
- Redirect validation — redirects re-validated at each hop
Blocked:
localhost,127.0.0.1,::1- Private ranges (
10.x,172.16.x,192.168.x) - Cloud metadata (
169.254.169.254) - Link-local, multicast, reserved IPs
file://,data://,javascript://,ftp://
Usage Tips for AI Agents
- Use
extractto re-read the same page — don't callbrowseagain - Use
browseonly for actual navigation (new URL or page change) - Reuse
session_idacross related operations - Always call
close_sessionwhen done to free resources - Use
scrape_webpagefor one-shot scraping (auto-closes session) - Use
crawl_pagesto spider multiple pages from a starting URL - Default navigation uses
domcontentloaded(fast, reliable) — usewait_forif you need a specific element
Project Structure
stealth-browser-mcp/
├── pyproject.toml # Dependencies, build config
├── setup.sh # One-command setup
├── src/stealth_browser/
│ ├── __init__.py
│ ├── __main__.py # Entry: python -m stealth_browser
│ ├── server.py # MCP server, 7 tools, lifespan
│ ├── browser_manager.py # Dual engine lifecycle, context pool
│ ├── session.py # Session state, locking, actions
│ ├── extractor.py # 3-tier content extraction
│ ├── dom_extractor.py # Structured DOM data extraction
│ ├── security.py # SSRF-hardened URL validation
│ ├── config.py # Configuration dataclass
│ └── proxy.py # Stub (Phase 2: Tor)
└── tests/
├── test_security.py # URL/IP validation tests
├── test_extractor.py # Extraction mode/fallback tests
├── test_dom_extractor.py # DOM structured data tests
└── test_server_helpers.py # Server helper function tests
Configuration
Defaults in config.py — no config file needed:
| Setting | Default | Description |
|---|---|---|
headless |
False |
Headed mode (Xvfb) for better stealth |
use_xvfb |
True |
Auto-start Xvfb for headed mode |
max_sessions |
5 |
Max concurrent browser sessions |
session_timeout_minutes |
10 |
Idle session eviction timeout |
navigation_timeout_ms |
30000 |
Page load timeout |
wait_until |
domcontentloaded |
Navigation wait strategy |
max_content_length |
50000 |
Content truncation limit (chars) |
block_media |
True |
Block images/fonts/media for speed |
camoufox_enabled |
True |
Enable Firefox fallback engine |
crawl_max_pages_limit |
20 |
Hard cap for crawl_pages |
crawl_per_page_max |
10000 |
Content limit per crawled page |
Dependencies
| Package | Purpose |
|---|---|
| mcp | MCP server framework (Anthropic) |
| patchright | Stealth Playwright fork (Chromium) |
| camoufox | Anti-fingerprint Firefox (fallback engine) |
| trafilatura | Article/content extraction |
| readability-lxml | Fallback HTML extraction |
| html2text | HTML to markdown conversion |
Troubleshooting
Browser fails to launch: error while loading shared libraries
Chromium needs system libraries that aren't installed by default on minimal Linux/WSL2:
error while loading shared libraries: libnspr4.so: cannot open shared object file
Solution:
sudo apt-get install -y libnspr4 libnss3 libatk1.0-0 libatk-bridge2.0-0 \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxrandr2 libgbm1 \
libpango-1.0-0 libcairo2 libasound2t64 xvfb
Camoufox won't start
Camoufox requires xvfb for headed mode:
sudo apt-get install -y xvfb
If Camoufox still fails, it falls back gracefully — Chromium-only mode still works.
MCP server not showing in Claude Code
The server must be registered in ~/.claude/mcp_servers.json:
{
"stealth-browser": {
"type": "stdio",
"command": "/absolute/path/to/.venv/bin/python",
"args": ["-m", "stealth_browser"]
}
}
After adding, restart Claude Code — MCP servers are loaded at startup only.
Tools show "Permission denied"
Add all 7 tools to ~/.claude/settings.json permissions (see Register section above).
Page content is empty or too short
- Try
extractwithmode="text"for SPAs/JS-heavy pages - Add
wait_forparameter with a CSS selector to wait for dynamic content - Try
engine: firefox— some sites respond better to Camoufox - The default
domcontentloadeddoesn't wait for lazy-loaded content — pass a selector that appears after the page fully renders
Bot-blocked on both engines
If engine: auto falls back to Firefox and still gets blocked, the site may require:
- A different IP/proxy (Phase 2)
- Manual CAPTCHA solving
- Specific cookies/authentication
Session not found
Sessions are evicted after 10 minutes of inactivity or when the 5-session limit is reached. If you get "Session 'xyz' not found", create a new one with browse.
Phase 2 (Planned)
screenshottool — for CAPTCHA/consent debuggingevaluate_jstool — targeted DOM queriessession_infotool — list active sessions and state- Per-toolcall hard timeout guard
- Proxy/Tor opt-in support
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.