MCP Servers

Crawl4AI MCP Server

High-performance server enabling AI assistants to access web scraping, crawling, and deep research capabilities through Model Context Protocol.

README

⚠️ NOTICE

MCP SERVER CURRENTLY UNDER DEVELOPMENT
NOT READY FOR PRODUCTION USE
WILL UPDATE WHEN OPERATIONAL

Crawl4AI MCP Server

🚀 High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl!

Overview

This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities.

Documentation

For comprehensive details about this project, please refer to the following documentation:

Migration Plan - Detailed plan for migrating from Firecrawl to Crawl4AI
Enhanced Architecture - Multi-tenant architecture with cloud provider flexibility
Implementation Guide - Technical implementation details and code examples
Codebase Simplification - Details on code simplification and best practices implemented

Features

Web Data Acquisition

🌐 Single Webpage Scraping: Extract content from individual webpages
🕸️ Web Crawling: Crawl websites with configurable depth and page limits
🗺️ URL Discovery: Map and discover URLs from a starting point
🕸️ Asynchronous Crawling: Crawl entire websites efficiently

Content Processing

🔍 Deep Research: Conduct comprehensive research across multiple pages
📊 Structured Data Extraction: Extract specific data using CSS selectors or LLM-based extraction
🔎 Content Search: Search through previously crawled content

Integration & Security

🔄 MCP Integration: Seamless integration with MCP clients (Claude Desktop, etc.)
🔒 OAuth Authentication: Secure access with proper authorization
🔒 Authentication Options: Secure access via OAuth or API key (Bearer token)
⚡ High Performance: Optimized for speed and efficiency

Project Structure

crawl4ai-mcp/
├── src/
│   ├── index.ts               # Main entry point with OAuth provider setup
│   ├── auth-handler.ts        # Authentication handler
│   ├── mcp-server.ts          # MCP server implementation
│   ├── crawl4ai-adapter.ts    # Adapter for Crawl4AI API
│   ├── tool-schemas/          # MCP tool schema definitions
│   │   └── [...].ts           # Tool schemas
│   ├── handlers/
│   │   ├── crawl.ts           # Web crawling implementation
│   │   ├── search.ts          # Search functionality
│   │   └── extract.ts         # Content extraction
│   └── utils/                 # Utility functions
├── tests/                     # Test cases
├── .github/                   # GitHub configuration
├── wrangler.toml              # CloudFlare Workers configuration
├── tsconfig.json              # TypeScript configuration
├── package.json               # Node.js dependencies
└── README.md                  # Project documentation

Getting Started

Prerequisites

Node.js (v18 or higher)
npm
Wrangler (CloudFlare Workers CLI)
A CloudFlare account

Installation

Clone the repository:

git clone https://github.com/BjornMelin/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server

Install dependencies:
```
npm install
```

Set up CloudFlare KV namespace:

wrangler kv:namespace create CRAWL_DATA

Update wrangler.toml with the KV namespace ID:

kv_namespaces = [
  { binding = "CRAWL_DATA", id = "your-namespace-id" }
]

Development

Local Development

Start the development server:
```
npm run dev
```
The server will be available at http://localhost:8787

Deployment

Deploy to CloudFlare Workers:
```
npm run deploy
```
Your server will be available at the CloudFlare Workers URL assigned to your deployed worker.

Usage with MCP Clients

This server implements the Model Context Protocol, allowing AI assistants to access its tools.

Authentication

Implement OAuth authentication with workers-oauth-provider
Add API key authentication using Bearer tokens
Create login page and token management

Connecting to an MCP Client

Use the CloudFlare Workers URL assigned to your deployed worker
In Claude Desktop or other MCP clients, add this server as a tool source

Available Tools

crawl: Crawl web pages from a starting URL
getCrawl: Retrieve crawl data by ID
listCrawls: List all crawls or filter by domain
search: Search indexed documents by query
extract: Extract structured content from a URL

Configuration

The server can be configured by modifying environment variables in wrangler.toml:

MAX_CRAWL_DEPTH: Maximum depth for web crawling (default: 3)
MAX_CRAWL_PAGES: Maximum pages to crawl (default: 100)
API_VERSION: API version string (default: "v1")
OAUTH_CLIENT_ID: OAuth client ID for authentication
OAUTH_CLIENT_SECRET: OAuth client secret for authentication

Roadmap

The project is being developed with these components in mind:

Project Setup and Configuration: CloudFlare Worker setup, TypeScript configuration
MCP Server and Tool Schemas: Implementation of MCP server with tool definitions
Crawl4AI Adapter: Integration with the Crawl4AI functionality
OAuth Authentication: Secure authentication implementation
Performance Optimizations: Enhancing speed and reliability
Advanced Extraction Features: Improving structured data extraction capabilities

Contributing

Contributions are welcome! Please check the open issues or create a new one before starting work on a feature or bug fix. See Contributing Guidelines for detailed guidelines.

Support

If you encounter issues or have questions:

Open an issue on the GitHub repository
Check the Crawl4AI documentation
Refer to the Model Context Protocol specification

How to Cite

If you use Crawl4AI MCP Server in your research or projects, please cite it using the following BibTeX entry:

@software{crawl4ai_mcp_2025,
  author = {Melin, Bjorn},
  title = {Crawl4AI MCP Server: High-performance Web Crawling for AI Assistants},
  url = {https://github.com/BjornMelin/crawl4ai-mcp-server},
  version = {1.0.0},
  year = {2025},
  month = {5}
}

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured