Crawl4AI MCP

Crawl4AI MCP

Enables AI assistants to crawl websites, extract dynamic content, navigate links, and save structured Markdown files via the MCP protocol, with support for anti-bot bypass, CSS selectors, and custom JavaScript execution.

Category
Visit Server

README

Web Crawler MCP

English 中文 हिंदी Español Français العربية বাংলা Русский Português Bahasa Indonesia

Python License

A powerful web crawling tool that integrates with AI assistants via the MCP (Model Context Protocol). This project allows AI assistants to crawl websites, extract dynamic content, navigate through links, and save structured Markdown files directly.

📋 Features

  • Native integration with AI assistants via MCP
  • Return scraped Markdown content directly to the AI
  • Extracts and surfaces internal/external links for AI navigation
  • Website crawling with configurable depth
  • Detailed crawl result statistics
  • Error and not found page handling
  • Advanced Scraping Capabilities:
    • Magic Mode: Bypass anti-bots (like Cloudflare) and simulate real browser behavior
    • Targeted Extraction: Fetch only what you need using CSS selectors
    • Custom JavaScript: Execute code before extraction (clicks, scrolls, form fills)
    • Persistent Sessions: Keep cookies and state across requests for authenticated sites
    • SPA Support: Wait for dynamic CSS selectors or set explicit pre-extraction delays

🚀 MCP Configuration

The simplest and recommended way to use this tool is via uvx, which automatically fetches and runs the latest version from GitHub without requiring you to clone the repository manually.

Prerequisites

  • uv installed on your system.

Setup for AI Assistants (e.g., Claude Desktop, Cline)

Add the following to your AI Assistant's MCP configuration file (e.g., cline_mcp_settings.json or claude_desktop_config.json):

Note for Windows Users: It is highly recommended to specify --python 3.12 to avoid compilation issues with certain dependencies.

{
  "mcpServers": {
    "crawl": {
      "command": "uvx",
      "args": [
        "--python",
        "3.12",
        "--from",
        "git+https://github.com/laurentvv/crawl4ai-mcp",
        "crawl4ai-mcp"
      ],
      "disabled": false,
      "autoApprove": [],
      "timeout": 600
    }
  }
}

Important: Browser Installation

The crawler uses Playwright to handle dynamic content. You must install the required browsers after setting up the tool:

uv run playwright install chromium

🖥️ Usage

Once configured, you can use the crawler by asking your AI assistant to perform a crawl.

Usage Examples with Claude/Cline

  • Simple Crawl: "Can you crawl the site example.com and give me a summary?"
  • Crawl with Options: "Can you crawl https://example.com with a depth of 3 and include external links?"
  • Dynamic Content: "Crawl this React app and wait for the .main-content selector to load."
  • Bypass Protections: "Crawl example.com but use 'magic mode' to bypass the anti-bot protection."
  • Targeted Extraction: "Crawl the docs site but only extract content matching the h1, p.lead CSS selector."

🛠️ Available Parameters (MCP Tool)

The crawl tool accepts the following parameters:

Parameter Type Description Default Value
url string URL to crawl (required) -
max_depth integer Maximum crawling depth 2
include_external boolean Include external links false
verbose boolean Enable detailed output true
wait_for_selector string CSS selector to wait for before extracting content. Useful for single-page applications. None
return_content boolean Whether to return the extracted content directly in the MCP response (truncated to 50k chars if necessary). true
output_file string Output file path automatically generated
magic boolean Enable magic mode to bypass anti-bots and simulate a real browser false
css_selector string Specific CSS selector to extract only targeted elements from the page None
js_code string Custom JavaScript code to execute on the page before extraction None
session_id string Persistent session identifier to keep cookies and browser state across requests None
delay_before_return_html number Delay in seconds to wait before extracting HTML (useful for heavy JS pages) None

👨‍💻 Development

If you want to modify the crawler or run it locally:

  1. Clone this repository:
git clone https://github.com/laurentvv/crawl4ai-mcp
cd crawl4ai-mcp
  1. Install dependencies using uv:
uv sync
  1. Run the MCP server directly:
uv run crawl4ai-mcp

🤝 Contribution

Contributions are welcome! Feel free to open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured