Firecrawl MCP Toolkit

Firecrawl MCP Toolkit

A high-performance, asynchronous MCP server that provides comprehensive Google search and web content scraping capabilities through the Firecrawl API, designed for LLMs to retrieve external information efficiently.

Category
Visit Server

README

Firecrawl MCP Toolkit

A high-performance, asynchronous MCP server that provides comprehensive Google search and web content scraping capabilities through the Firecrawl API (excluding some rarely used interfaces).

This project is built on httpx, utilizing asynchronous clients and connection pool management to offer LLMs a stable and efficient external information retrieval tool.

PyPI Package

firecrawl-toolkit: https://pypi.org/project/firecrawl-toolkit/

Key Features

  • Asynchronous Architecture: Fully based on asyncio and httpx, ensuring high throughput and non-blocking I/O operations.
  • HTTP Connection Pool: Manages and reuses TCP connections through a global httpx.AsyncClient instance, significantly improving performance under high concurrency.
  • Concurrency Control: Built-in global and per-API endpoint concurrency semaphores effectively manage API request rates to prevent exceeding rate limits.
  • Automatic Retry Mechanism: Integrated request retry functionality with exponential backoff strategy automatically handles temporary network fluctuations or server errors, enhancing service stability.
  • Intelligent Country Code Parsing: Includes a comprehensive country name dictionary supporting inputs in Chinese, English, ISO Alpha-2/3, and other formats, with automatic normalization.
  • Response Field Mapping: Search/Scrape responses are normalized into minimal, client-facing JSON schemas instead of upstream passthrough payloads.
  • Noise Reduction for Scrape: Built-in excludeTags selector filtering removes common non-content blocks (navigation, ads, sidebars, comments, etc.) to improve signal quality. Supports returning a specified Markdown character window with startIndex and maxCharacters.
  • Flexible Environment Variable Configuration: Supports fine-tuned service configuration via environment variables.
  • The Search and Scrape Endpoints perform some request pre-processing and post-processing, which can save quite a few tokens.

Available Tools

This service provides the following tools:

Tool Name Description
firecrawl-aggregated-search Aggregated Search Interface, Combining Webpage, News, And Image Search Results.
firecrawl-web-search Web Search Interface.
firecrawl-news-search News Search Interface.
firecrawl-image-search Image Search Interface.
firecrawl-scrape Scrapes and returns the content of a specified URL.

Installation Guide

It is recommended to install using pip or uv.

# Using pip
pip install firecrawl-toolkit

# Or using uv
uv pip install firecrawl-toolkit

Quick Start

Set Environment Variables

Create a .env file in the project root directory and enter your Firecrawl API key:

Environment Variables Default value Description
FIRECRAWL_API_KEY fc-xxx Your Firecrawl API key. Multiple keys can be separated by commas, and one will be selected randomly for each request.
FIRECRAWL_HTTP2 0 Disable or enable HTTP2, <0/1>
FIRECRAWL_MAX_WORKERS 10 Number of processes
FIRECRAWL_MAX_CONNECTIONS 200 Maximum number of connections
FIRECRAWL_MAX_CONCURRENT_REQUESTS 200 Maximum number of concurrent requests
FIRECRAWL_KEEPALIVE 20 Maximum number of concurrent connections
FIRECRAWL_RETRY_COUNT 3 Maximum number of retries
FIRECRAWL_RETRY_BASE_DELAY 0.5 Base delay time for retries in seconds
FIRECRAWL_ENDPOINT_CONCURRENCY {"search":10,"scrape":2} Set concurrency per endpoint (JSON format)
FIRECRAWL_ENDPOINT_RETRYABLE {"scrape": false} Set retry allowance per endpoint (JSON format)
FIRECRAWL_MCP_ENABLE_STDIO 0 Disable or enable STDIO, <0/1>
FIRECRAWL_MCP_ENABLE_HTTP 0 Disable or enable HTTP, <0/1>
FIRECRAWL_MCP_ENABLE_SSE 0 Disable or enable SSE, <0/1>
FIRECRAWL_MCP_HTTP_HOST 127.0.0.1 HTTP host address
FIRECRAWL_MCP_HTTP_PORT 7001 HTTP host port
FIRECRAWL_MCP_SSE_HOST 127.0.0.1 SSE host address
FIRECRAWL_MCP_SSE_PORT 7001 SSE host port
FIRECRAWL_MCP_LOCK_FILE /tmp/firecrawl_mcp.lock Lock file path
  • STDIO, HTTP, and SSE can only be used one at a time. If you need to use multiple protocols, please start separate services for each.
  • When using multiple services, please specify different lock files for each.

Configure MCP Client

Add the following server configuration in the MCP client configuration file:

{
  "mcpServers": {
    "firecrawl": {
      "command": "python3",
      "args": ["-m", "firecrawl-toolkit"],
      "env": {
        "FIRECRAWL_API_KEY": "<Your Firecrawl API key>"
      }
    }
  }
}
{
  "mcpServers": {
    "firecrawl": {
      "command": "uvx",
      "args": ["firecrawl-toolkit"],
      "env": {
        "FIRECRAWL_API_KEY": "<Your Firecrawl API key>"
      }
    }
  }
}

Go CLI

The Go CLI is located in the cli directory. It is a standalone command-line client named firecrawl, separate from the Python MCP server.

The CLI reads the API key only from FIRECRAWL_KEY:

export FIRECRAWL_KEY="<Your Firecrawl API key>"

Build From Source

Build for the current platform:

cd cli
go test ./...
go build -o firecrawl .

Run it directly after building:

./firecrawl --help

Build all release targets locally:

cd cli
mkdir -p dist

CGO_ENABLED=0 GOOS=windows GOARCH=amd64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_windows_amd64.exe .
CGO_ENABLED=0 GOOS=windows GOARCH=arm64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_windows_arm64.exe .
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_linux_amd64 .
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_linux_arm64 .
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_darwin_amd64 .
CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_darwin_arm64 .

CLI Search Usage

Search commands:

firecrawl aggregated --query "AI advancements 2024" --country "United States" --search-num 5 --search-time month --timeout 120
firecrawl web --query "AI advancements 2024" --country US --search-num 5
firecrawl news --query "OpenAI news" --search-time week
firecrawl image --query "firecrawl logo" --search-num 10

Search command parameters:

  • --query (required): Search keywords.
  • --country (optional): Country or region name / ISO code. Default is US.
  • --search-num (optional): Number of results, range 1-100. Default is 20.
  • --search-time (optional): One of hour, day, week, month, year.
  • --timeout (optional): Request timeout in seconds. Must be > 0. Default is 120.

Search commands output compact single-line JSON, using the same mapped fields as the Python search tools:

{"success":true,"data":{"web":[],"news":[],"images":[]},"creditsUsed":1}

CLI Credit Usage

Check team credit usage:

firecrawl credit-usage
firecrawl credit-usage --pretty

Credit usage command parameters:

  • --json (optional): Output JSON. JSON is the default output format.
  • --pretty (optional): Pretty-print JSON output.

Default output is compact JSON:

{"success":true,"data":{"remainingCredits":1000,"planCredits":500000,"billingPeriodStart":"2025-01-01T00:00:00Z","billingPeriodEnd":"2025-01-31T23:59:59Z"}}

CLI Scrape Usage

Scrape a page and save the markdown export as example.md in the current directory:

firecrawl scrape \
  --output example \
  --url "https://www.example.com" \
  --include-tags '["article",".content"]' \
  --exclude-tags ".nav,.footer" \
  --empty-tags \
  --start-index 0 \
  --max-characters 1200 \
  --headers '{"X-Trace-Id":"abc123"}' \
  --timeout 120

Save the markdown export to a specific directory:

firecrawl scrape \
  --output example \
  --path ./exports/pages \
  --url "https://www.example.com"

--include-tags and --exclude-tags accept these input forms:

# Single selector
firecrawl scrape --output page --url "https://www.example.com" --include-tags "article"

# Comma-separated selector list
firecrawl scrape --output page --url "https://www.example.com" --exclude-tags ".nav,.footer,#sidebar"

# JSON string array, recommended when selectors contain spaces, quotes, or commas
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["main article",".post-content","#content"]'

Common CSS selector types:

# Tag, class, and ID selectors
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["article",".content","#main"]'

# Attribute selectors with square brackets
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["[data-testid=\"article-body\"]","[class*=\"content\"]","[id^=\"post-\"]"]'

# Descendant, child, and compound selectors
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["main article","main > article","article.post"]'

# Exclusion selectors
firecrawl scrape --output page --url "https://www.example.com" --exclude-tags '["nav[aria-label=\"Breadcrumb\"]","aside.related",".promo-banner"]'

# Clear built-in exclude selectors while keeping user-provided --exclude-tags
firecrawl scrape --output page --url "https://www.example.com" --empty-tags --exclude-tags ".nav"

# Selectors that contain commas must use a JSON string array
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["article:has(h1, h2)",".content"]'

Scrape command parameters:

  • --output (required): Export name. The CLI writes <output>.md.
  • --path (optional): Directory where the markdown export is saved. Supports absolute and relative paths. Defaults to the current directory. If the directory does not exist, the CLI tries to create it before scraping.
  • --url (required): Target webpage URL.
  • --include-tags (optional): CSS selectors to include. Accepts a single selector, comma-separated selector string, or JSON string array.
  • --exclude-tags (optional): Additional CSS selectors to exclude. Accepts a single selector, comma-separated selector string, or JSON string array.
  • --empty-tags (optional): Clear the built-in exclude selector list for this request while keeping user-provided --exclude-tags.
  • --start-index (optional): Markdown truncation start index. Must be >= 0. Default is 0.
  • --max-characters (optional): Maximum markdown characters from --start-index. Must be > 0 when provided.
  • --headers (optional): JSON object with string values, for example {"Authorization":"Bearer token","X-Trace-Id":"abc123"}.
  • --timeout (optional): Request timeout in seconds. Must be > 0. Default is 120.

Scrape output:

  • On success, stdout is true, and the CLI writes <output>.md under --path or the current directory.
  • On failure, stdout is false followed by the error reason, and no file is created or overwritten.

The generated markdown file uses this structure:

## title:
## description:
## url:
## language:
## creditsUsed:

---

markdown content

Tool Parameters and Usage Examples

firecrawl Search: Perform aggregated / web / news / images search

Parameters:

  • query (str, required): Keywords to search.
  • country (str, optional): Specify the country/region for search results. Supports Chinese names (e.g., "China"), English names (e.g., "United States"), or ISO codes (e.g., "US"). Default is "US".
  • search_num (int, optional): Number of results to return, range 1-100. Default is 20.
  • search_time (str, optional): Filter results by time range. Available values: "hour", "day", "week", "month", "year".

Example:

result_json = firecrawl_web_search(
    query="AI advancements 2024",
    country="United States",
    search_num=5,
    search_time="month"
)

Response (mapped):

  • Top-level fields: success, data, creditsUsed
  • data.web[]: title, description, url
  • data.news[]: title, snippet, url, date
  • data.images[]: title, imageUrl, url
  • web / news / images remain arrays and may be empty ([])
  • Missing mapped fields are preserved as null
  • Output is compact single-line JSON (no extra spaces)

Example response:

{"success":true,"data":{"web":[{"title":"Example Web","description":"Example description","url":"https://example.com"}],"news":[],"images":[]},"creditsUsed":1}

firecrawl-scrape: Scrape webpage content

Parameters:

  • url (str, required): URL of the target webpage.
  • excludeTags (list[str], optional, default []): Additional CSS selectors to exclude; merged with built-in noise-filter selectors after normalization and deduplication unless emptyTags=True.
  • includeTags (list[str], optional, default None): Additional CSS selectors to include; no built-in defaults are applied, and the cleaned list is forwarded only when this parameter is provided.
  • maxCharacters (int, optional, default None): Truncate only the returned markdown to N characters starting at startIndex. Invalid values (non-int, <= 0) are ignored and treated as not provided.
  • startIndex (int, optional, default 0): Start offset used with maxCharacters when slicing returned markdown. Invalid values (non-int, < 0) are treated as 0.
  • emptyTags (bool, optional, default False): Clear the built-in exclude selector list for this request, while still keeping any user-provided excludeTags.
  • headers (dict[str, str], optional, default None): Root-level request headers passed through to the upstream scrape request only when a non-empty object is provided.

Example:

result_json = firecrawl_scrape(
    url="https://www.example.com",
    includeTags=["article", ".content"],
    excludeTags=["[class^=\"skip\"]", "[id*=\"disqus\"]"],
    startIndex=0,
    maxCharacters=1200,
    headers={"Authorization": "Bearer token", "X-Trace-Id": "abc123"}
)

This returns at most 1200 characters in markdown, starting at character index 0.

To explicitly send an empty include selector list:

result_json = firecrawl_scrape(
    url="https://www.example.com",
    includeTags=[]
)

To disable only the built-in exclude selectors for one request:

result_json = firecrawl_scrape(
    url="https://www.example.com",
    emptyTags=True
)

To disable the built-in exclude selectors but keep your own:

result_json = firecrawl_scrape(
    url="https://www.example.com",
    excludeTags=[".nav"],
    emptyTags=True
)

Built-in noise filtering:

  • The tool uses an internal excludeTags selector set to suppress noisy DOM regions and prioritize main content quality.
  • includeTags has no built-in defaults and is only forwarded when explicitly provided.
  • Passing emptyTags=True clears only the built-in exclude selector set for that request.
  • If the first scrape returns data.markdown == "", the tool automatically retries once without includeTags/excludeTags as a fallback.
  • startIndex / maxCharacters slicing is applied locally in this toolkit post-processing and is not forwarded to upstream Firecrawl payloads.

Response (mapped):

  • Top-level fields: success, proxyUsed, title, description, language, markdown, creditsUsed
  • markdown is URL-decoded before returning to the client
  • When a valid maxCharacters is provided, markdown length is capped at that value after applying startIndex
  • Missing mapped fields are preserved as null
  • Output is compact single-line JSON (no extra spaces)

Example response:

{"success":true,"proxyUsed":"auto","title":"Example Page","description":"Example summary","language":"en","markdown":"Hello world!","creditsUsed":1}

Response Contract Notes

  • firecrawl-search and firecrawl-scrape success payloads are mapped to stable minimal schemas.
  • Missing mapped fields are preserved as null (arrays remain arrays, and may be empty).
  • Both success and error responses are compact single-line JSON.

License Agreement

This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured