MCP-Crawl4AI
A Model Context Protocol server that enables AI systems to crawl and scrape the live web using Crawl4AI and headless Chromium. It provides tools for structured data extraction, deep site traversal, and session-aware workflows with LLM-optimized outputs like markdown.
README
<div align="center">
MCP-Crawl4AI
<img src=".github/assets/img/logo.png" alt="MCP-Crawl4AI Logo" width="200"/>
A Model Context Protocol server for web crawling powered by Crawl4AI
<!-- BADGES:START -->
<!-- generated by add-badges 2026-03-02 -->
</div>
Overview
MCP-Crawl4AI is a Model Context Protocol server that gives AI systems access to the live web. Built on FastMCP v3 and Crawl4AI, it exposes 4 tools, 2 resources, and 3 prompts through the standardized MCP interface, backed by a lifespan-managed headless Chromium browser.
Only 2 runtime dependencies — fastmcp and crawl4ai.
[!TIP] Full documentation site →
Key Features
- Full MCP compliance via FastMCP v3 with tool annotations (
readOnlyHint,destructiveHint, etc.) - 4 focused tools centered on canonical scrape/crawl plus session lifecycle/artifacts
- 3 prompts for common LLM workflows (summarize, extract schema, compare pages)
- 2 resources exposing server configuration and version info
- Headless Chromium managed as a lifespan singleton (start once, reuse everywhere)
- Multiple transports — stdio (default) and Streamable HTTP
- LLM-optimized output — markdown, cleaned HTML, raw HTML, or plain text
- Canonical option groups for extraction, runtime, diagnostics, sessions, rendering, and traversal
- List and deep traversal in one
crawlcontract - Session-aware workflows with explicit session close and artifact retrieval tools
- Auto browser setup — detects missing Playwright browsers and installs automatically
Installation
<details> <summary><strong>pip</strong></summary>
pip install mcp-crawl4ai
mcp-crawl4ai --setup # one-time: installs Playwright browsers
</details>
<details> <summary><strong>uv (recommended)</strong></summary>
uv add mcp-crawl4ai
mcp-crawl4ai --setup # one-time: installs Playwright browsers
</details>
<details> <summary><strong>Docker</strong></summary>
docker build -t mcp-crawl4ai .
docker run -p 8000:8000 mcp-crawl4ai
The Docker image includes Playwright browsers — no separate setup needed.
</details>
<details> <summary><strong>Development</strong></summary>
git clone https://github.com/wyattowalsh/mcp-crawl4ai.git
cd mcp-crawl4ai
uv sync --group dev
mcp-crawl4ai --setup
</details>
[!NOTE] The server auto-detects missing Playwright browsers on first startup and attempts to install them automatically. You can also run
mcp-crawl4ai --setuporcrawl4ai-setupmanually at any time.
Quick Start
<details open> <summary><strong>stdio (default — for Claude Desktop, Cursor, etc.)</strong></summary>
mcp-crawl4ai
</details>
<details> <summary><strong>HTTP transport</strong></summary>
mcp-crawl4ai --transport http --port 8000
[!NOTE] HTTP binds to
127.0.0.1by default (private/local only); for external exposure, set--hostexplicitly and use a reverse proxy for TLS/auth.
</details>
<details> <summary><strong>Claude Desktop configuration</strong></summary>
Add to your Claude Desktop MCP settings (claude_desktop_config.json):
{
"mcpServers": {
"crawl4ai": {
"command": "mcp-crawl4ai",
"args": ["--transport", "stdio"]
}
}
}
</details>
<details> <summary><strong>Claude Code configuration</strong></summary>
claude mcp add crawl4ai -- mcp-crawl4ai --transport stdio
</details>
<details> <summary><strong>MCP Inspector</strong></summary>
npx @modelcontextprotocol/inspector uv --directory . run mcp-crawl4ai
</details>
Tools
The canonical surface now exposes 4 tools:
scrape
Scrape one URL or a bounded list of URLs (up to 20) with a single canonical envelope response.
- Input:
targets(str | list[str]) and optional groupedoptions - Supports extraction (
schema,extraction_mode), runtime controls, diagnostics, session settings, render settings, and artifact capture - Returns canonical JSON envelope with
schema_version,tool,ok,data/items,meta,warnings,error
crawl
Crawl with canonical traversal controls.
options.traversal.mode="list"for bounded list traversaloptions.traversal.mode="deep"for recursive BFS/DFS traversal from a single seed- Shares scrape option groups plus traversal options in the same envelope shape
close_session
Close a stateful session created via options.session.session_id.
get_artifact
Retrieve artifact metadata/content captured during scrape or crawl when options.conversion.capture_artifacts is enabled.
Choosing core vs advanced usage
- Core path (recommended): use
scrape/crawlwith minimal options (runtime,conversion.output_format,traversal.mode="list"). This keeps behavior predictable and low-risk for most agent workflows. - Advanced path (explicit opt-in): use deep traversal, custom dispatcher controls, JS transforms, extraction schemas, and artifact capture only when required by task outcomes.
- Safety budgets and gates: inspect
config://serverforsettings.defaults,settings.limits,settings.policies, andsettings.capabilitiesto understand active constraints and feature gates before enabling advanced options.
Resources
| URI | MIME Type | Description |
|---|---|---|
config://server |
application/json |
Current server configuration: name, version, tool list, browser config |
crawl4ai://version |
application/json |
Server and dependency version information (server, crawl4ai, fastmcp) |
Prompts
| Prompt | Parameters | Description |
|---|---|---|
summarize_page |
url, focus (default: "key points") |
Crawl a page and summarize its content with the specified focus |
build_extraction_schema |
url, data_type |
Inspect a page and build a CSS extraction schema for scrape |
compare_pages |
url1, url2 |
Crawl two pages and produce a structured comparison |
Architecture
graph TD
A[MCP Client] -->|stdio / HTTP| B[FastMCP v3]
B --> C[Tool Router]
C --> D[scrape]
C --> E[crawl]
C --> F[close_session]
C --> G[get_artifact]
D & E & F & G --> N[AsyncWebCrawler Singleton]
N --> O[Headless Chromium]
B --> P[Resources]
B --> Q[Prompts]
style B fill:#4B8BBE,color:#fff
style N fill:#FF6B35,color:#fff
style O fill:#2496ED,color:#fff
The server uses a single-module architecture:
- FastMCP v3 handles MCP protocol negotiation, transport, tool/resource/prompt registration, and message routing
- Lifespan-managed
AsyncWebCrawlerstarts a headless Chromium browser once at server startup and shares it across all tool invocations, then shuts it down cleanly on exit - 4 tool functions decorated with
@mcp.tool()define the canonical surface - 2 resource functions decorated with
@mcp.resource()return JSON - 3 prompt functions decorated with
@mcp.promptreturn structuredMessagelists
[!IMPORTANT] There are no intermediate manager classes or custom HTTP clients. The server delegates all crawling to crawl4ai's
AsyncWebCrawlerand all protocol handling to FastMCP. Only 2 runtime dependencies.
Configuration
<details> <summary><strong>CLI flags</strong></summary>
| Flag | Default | Description |
|---|---|---|
--transport |
stdio |
Transport protocol: stdio or http |
--host |
127.0.0.1 |
Host to bind (HTTP transport only) |
--port |
8000 |
Port to bind (HTTP transport only) |
--setup |
— | Install Playwright browsers and exit |
</details>
<details> <summary><strong>Environment variables</strong></summary>
No environment variables are required. The server uses sensible defaults for all configuration. Crawl4AI's own environment variables (e.g., CRAWL4AI_VERBOSE) are respected if set.
</details>
Testing
# Run all tests
uv run pytest
# Coverage gate (>=90%)
uv run pytest --cov=mcp_crawl4ai --cov-report=term-missing
# Smoke tests only
uv run pytest -m smoke
# Unit tests only
uv run pytest -m unit
# Integration workflow tests
uv run pytest -m integration
# End-to-end workflow tests
uv run pytest -m e2e
# Manual live test (requires browser)
uv run python tests/manual/test_live.py
[!NOTE] All automated tests run in-memory using
fastmcp.Client(mcp)— no browser or network required. The test suite mocksAsyncWebCrawlerfor fast, deterministic execution.
Contributing
See the Contributing Guide for details on setting up your development environment, coding standards, and the pull request process.
License
This project is licensed under the MIT License. See the LICENSE file for details.
<div align="center">
MCP-Crawl4AI — Connecting AI to the Live Web
</div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.