Firecrawl MCP Toolkit
A high-performance, asynchronous MCP server that provides comprehensive Google search and web content scraping capabilities through the Firecrawl API, designed for LLMs to retrieve external information efficiently.
README
Firecrawl MCP Toolkit
A high-performance, asynchronous MCP server that provides comprehensive Google search and web content scraping capabilities through the Firecrawl API (excluding some rarely used interfaces).
This project is built on httpx, utilizing asynchronous clients and connection pool management to offer LLMs a stable and efficient external information retrieval tool.
PyPI Package
firecrawl-toolkit: https://pypi.org/project/firecrawl-toolkit/
Key Features
- Asynchronous Architecture: Fully based on
asyncioandhttpx, ensuring high throughput and non-blocking I/O operations. - HTTP Connection Pool: Manages and reuses TCP connections through a global
httpx.AsyncClientinstance, significantly improving performance under high concurrency. - Concurrency Control: Built-in global and per-API endpoint concurrency semaphores effectively manage API request rates to prevent exceeding rate limits.
- Automatic Retry Mechanism: Integrated request retry functionality with exponential backoff strategy automatically handles temporary network fluctuations or server errors, enhancing service stability.
- Intelligent Country Code Parsing: Includes a comprehensive country name dictionary supporting inputs in Chinese, English, ISO Alpha-2/3, and other formats, with automatic normalization.
- Response Field Mapping: Search/Scrape responses are normalized into minimal, client-facing JSON schemas instead of upstream passthrough payloads.
- Noise Reduction for Scrape: Built-in
excludeTagsselector filtering removes common non-content blocks (navigation, ads, sidebars, comments, etc.) to improve signal quality. Supports returning a specified Markdown character window withstartIndexandmaxCharacters. - Flexible Environment Variable Configuration: Supports fine-tuned service configuration via environment variables.
- The Search and Scrape Endpoints perform some request pre-processing and post-processing, which can save quite a few tokens.
Available Tools
This service provides the following tools:
| Tool Name | Description |
|---|---|
firecrawl-aggregated-search |
Aggregated Search Interface, Combining Webpage, News, And Image Search Results. |
firecrawl-web-search |
Web Search Interface. |
firecrawl-news-search |
News Search Interface. |
firecrawl-image-search |
Image Search Interface. |
firecrawl-scrape |
Scrapes and returns the content of a specified URL. |
Installation Guide
It is recommended to install using pip or uv.
# Using pip
pip install firecrawl-toolkit
# Or using uv
uv pip install firecrawl-toolkit
Quick Start
Set Environment Variables
Create a .env file in the project root directory and enter your Firecrawl API key:
| Environment Variables | Default value | Description |
|---|---|---|
FIRECRAWL_API_KEY |
fc-xxx | Your Firecrawl API key. Multiple keys can be separated by commas, and one will be selected randomly for each request. |
FIRECRAWL_HTTP2 |
0 | Disable or enable HTTP2, <0/1> |
FIRECRAWL_MAX_WORKERS |
10 | Number of processes |
FIRECRAWL_MAX_CONNECTIONS |
200 | Maximum number of connections |
FIRECRAWL_MAX_CONCURRENT_REQUESTS |
200 | Maximum number of concurrent requests |
FIRECRAWL_KEEPALIVE |
20 | Maximum number of concurrent connections |
FIRECRAWL_RETRY_COUNT |
3 | Maximum number of retries |
FIRECRAWL_RETRY_BASE_DELAY |
0.5 | Base delay time for retries in seconds |
FIRECRAWL_ENDPOINT_CONCURRENCY |
{"search":10,"scrape":2} |
Set concurrency per endpoint (JSON format) |
FIRECRAWL_ENDPOINT_RETRYABLE |
{"scrape": false} |
Set retry allowance per endpoint (JSON format) |
FIRECRAWL_MCP_ENABLE_STDIO |
0 | Disable or enable STDIO, <0/1> |
FIRECRAWL_MCP_ENABLE_HTTP |
0 | Disable or enable HTTP, <0/1> |
FIRECRAWL_MCP_ENABLE_SSE |
0 | Disable or enable SSE, <0/1> |
FIRECRAWL_MCP_HTTP_HOST |
127.0.0.1 | HTTP host address |
FIRECRAWL_MCP_HTTP_PORT |
7001 | HTTP host port |
FIRECRAWL_MCP_SSE_HOST |
127.0.0.1 | SSE host address |
FIRECRAWL_MCP_SSE_PORT |
7001 | SSE host port |
FIRECRAWL_MCP_LOCK_FILE |
/tmp/firecrawl_mcp.lock |
Lock file path |
- STDIO, HTTP, and SSE can only be used one at a time. If you need to use multiple protocols, please start separate services for each.
- When using multiple services, please specify different lock files for each.
Configure MCP Client
Add the following server configuration in the MCP client configuration file:
{
"mcpServers": {
"firecrawl": {
"command": "python3",
"args": ["-m", "firecrawl-toolkit"],
"env": {
"FIRECRAWL_API_KEY": "<Your Firecrawl API key>"
}
}
}
}
{
"mcpServers": {
"firecrawl": {
"command": "uvx",
"args": ["firecrawl-toolkit"],
"env": {
"FIRECRAWL_API_KEY": "<Your Firecrawl API key>"
}
}
}
}
Go CLI
The Go CLI is located in the cli directory. It is a standalone command-line client named firecrawl, separate from the Python MCP server.
The CLI reads the API key only from FIRECRAWL_KEY:
export FIRECRAWL_KEY="<Your Firecrawl API key>"
Build From Source
Build for the current platform:
cd cli
go test ./...
go build -o firecrawl .
Run it directly after building:
./firecrawl --help
Build all release targets locally:
cd cli
mkdir -p dist
CGO_ENABLED=0 GOOS=windows GOARCH=amd64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_windows_amd64.exe .
CGO_ENABLED=0 GOOS=windows GOARCH=arm64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_windows_arm64.exe .
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_linux_amd64 .
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_linux_arm64 .
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_darwin_amd64 .
CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build -trimpath -ldflags="-s -w" -o dist/firecrawl_darwin_arm64 .
CLI Search Usage
Search commands:
firecrawl aggregated --query "AI advancements 2024" --country "United States" --search-num 5 --search-time month --timeout 120
firecrawl web --query "AI advancements 2024" --country US --search-num 5
firecrawl news --query "OpenAI news" --search-time week
firecrawl image --query "firecrawl logo" --search-num 10
Search command parameters:
--query(required): Search keywords.--country(optional): Country or region name / ISO code. Default isUS.--search-num(optional): Number of results, range1-100. Default is20.--search-time(optional): One ofhour,day,week,month,year.--timeout(optional): Request timeout in seconds. Must be> 0. Default is120.
Search commands output compact single-line JSON, using the same mapped fields as the Python search tools:
{"success":true,"data":{"web":[],"news":[],"images":[]},"creditsUsed":1}
CLI Credit Usage
Check team credit usage:
firecrawl credit-usage
firecrawl credit-usage --pretty
Credit usage command parameters:
--json(optional): Output JSON. JSON is the default output format.--pretty(optional): Pretty-print JSON output.
Default output is compact JSON:
{"success":true,"data":{"remainingCredits":1000,"planCredits":500000,"billingPeriodStart":"2025-01-01T00:00:00Z","billingPeriodEnd":"2025-01-31T23:59:59Z"}}
CLI Scrape Usage
Scrape a page and save the markdown export as example.md in the current directory:
firecrawl scrape \
--output example \
--url "https://www.example.com" \
--include-tags '["article",".content"]' \
--exclude-tags ".nav,.footer" \
--empty-tags \
--start-index 0 \
--max-characters 1200 \
--headers '{"X-Trace-Id":"abc123"}' \
--timeout 120
Save the markdown export to a specific directory:
firecrawl scrape \
--output example \
--path ./exports/pages \
--url "https://www.example.com"
--include-tags and --exclude-tags accept these input forms:
# Single selector
firecrawl scrape --output page --url "https://www.example.com" --include-tags "article"
# Comma-separated selector list
firecrawl scrape --output page --url "https://www.example.com" --exclude-tags ".nav,.footer,#sidebar"
# JSON string array, recommended when selectors contain spaces, quotes, or commas
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["main article",".post-content","#content"]'
Common CSS selector types:
# Tag, class, and ID selectors
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["article",".content","#main"]'
# Attribute selectors with square brackets
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["[data-testid=\"article-body\"]","[class*=\"content\"]","[id^=\"post-\"]"]'
# Descendant, child, and compound selectors
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["main article","main > article","article.post"]'
# Exclusion selectors
firecrawl scrape --output page --url "https://www.example.com" --exclude-tags '["nav[aria-label=\"Breadcrumb\"]","aside.related",".promo-banner"]'
# Clear built-in exclude selectors while keeping user-provided --exclude-tags
firecrawl scrape --output page --url "https://www.example.com" --empty-tags --exclude-tags ".nav"
# Selectors that contain commas must use a JSON string array
firecrawl scrape --output page --url "https://www.example.com" --include-tags '["article:has(h1, h2)",".content"]'
Scrape command parameters:
--output(required): Export name. The CLI writes<output>.md.--path(optional): Directory where the markdown export is saved. Supports absolute and relative paths. Defaults to the current directory. If the directory does not exist, the CLI tries to create it before scraping.--url(required): Target webpage URL.--include-tags(optional): CSS selectors to include. Accepts a single selector, comma-separated selector string, or JSON string array.--exclude-tags(optional): Additional CSS selectors to exclude. Accepts a single selector, comma-separated selector string, or JSON string array.--empty-tags(optional): Clear the built-in exclude selector list for this request while keeping user-provided--exclude-tags.--start-index(optional): Markdown truncation start index. Must be>= 0. Default is0.--max-characters(optional): Maximum markdown characters from--start-index. Must be> 0when provided.--headers(optional): JSON object with string values, for example{"Authorization":"Bearer token","X-Trace-Id":"abc123"}.--timeout(optional): Request timeout in seconds. Must be> 0. Default is120.
Scrape output:
- On success, stdout is
true, and the CLI writes<output>.mdunder--pathor the current directory. - On failure, stdout is
falsefollowed by the error reason, and no file is created or overwritten.
The generated markdown file uses this structure:
## title:
## description:
## url:
## language:
## creditsUsed:
---
markdown content
Tool Parameters and Usage Examples
firecrawl Search: Perform aggregated / web / news / images search
Parameters:
query(str, required): Keywords to search.country(str, optional): Specify the country/region for search results. Supports Chinese names (e.g., "China"), English names (e.g., "United States"), or ISO codes (e.g., "US"). Default is "US".search_num(int, optional): Number of results to return, range 1-100. Default is 20.search_time(str, optional): Filter results by time range. Available values: "hour", "day", "week", "month", "year".
Example:
result_json = firecrawl_web_search(
query="AI advancements 2024",
country="United States",
search_num=5,
search_time="month"
)
Response (mapped):
- Top-level fields:
success,data,creditsUsed data.web[]:title,description,urldata.news[]:title,snippet,url,datedata.images[]:title,imageUrl,urlweb/news/imagesremain arrays and may be empty ([])- Missing mapped fields are preserved as
null - Output is compact single-line JSON (no extra spaces)
Example response:
{"success":true,"data":{"web":[{"title":"Example Web","description":"Example description","url":"https://example.com"}],"news":[],"images":[]},"creditsUsed":1}
firecrawl-scrape: Scrape webpage content
Parameters:
url(str, required): URL of the target webpage.excludeTags(list[str], optional, default[]): Additional CSS selectors to exclude; merged with built-in noise-filter selectors after normalization and deduplication unlessemptyTags=True.includeTags(list[str], optional, defaultNone): Additional CSS selectors to include; no built-in defaults are applied, and the cleaned list is forwarded only when this parameter is provided.maxCharacters(int, optional, defaultNone): Truncate only the returnedmarkdownto N characters starting atstartIndex. Invalid values (non-int,<= 0) are ignored and treated as not provided.startIndex(int, optional, default0): Start offset used withmaxCharacterswhen slicing returnedmarkdown. Invalid values (non-int,< 0) are treated as0.emptyTags(bool, optional, defaultFalse): Clear the built-in exclude selector list for this request, while still keeping any user-providedexcludeTags.headers(dict[str, str], optional, defaultNone): Root-level request headers passed through to the upstream scrape request only when a non-empty object is provided.
Example:
result_json = firecrawl_scrape(
url="https://www.example.com",
includeTags=["article", ".content"],
excludeTags=["[class^=\"skip\"]", "[id*=\"disqus\"]"],
startIndex=0,
maxCharacters=1200,
headers={"Authorization": "Bearer token", "X-Trace-Id": "abc123"}
)
This returns at most 1200 characters in markdown, starting at character index 0.
To explicitly send an empty include selector list:
result_json = firecrawl_scrape(
url="https://www.example.com",
includeTags=[]
)
To disable only the built-in exclude selectors for one request:
result_json = firecrawl_scrape(
url="https://www.example.com",
emptyTags=True
)
To disable the built-in exclude selectors but keep your own:
result_json = firecrawl_scrape(
url="https://www.example.com",
excludeTags=[".nav"],
emptyTags=True
)
Built-in noise filtering:
- The tool uses an internal
excludeTagsselector set to suppress noisy DOM regions and prioritize main content quality. includeTagshas no built-in defaults and is only forwarded when explicitly provided.- Passing
emptyTags=Trueclears only the built-in exclude selector set for that request. - If the first scrape returns
data.markdown == "", the tool automatically retries once withoutincludeTags/excludeTagsas a fallback. startIndex/maxCharactersslicing is applied locally in this toolkit post-processing and is not forwarded to upstream Firecrawl payloads.
Response (mapped):
- Top-level fields:
success,proxyUsed,title,description,language,markdown,creditsUsed markdownis URL-decoded before returning to the client- When a valid
maxCharactersis provided,markdownlength is capped at that value after applyingstartIndex - Missing mapped fields are preserved as
null - Output is compact single-line JSON (no extra spaces)
Example response:
{"success":true,"proxyUsed":"auto","title":"Example Page","description":"Example summary","language":"en","markdown":"Hello world!","creditsUsed":1}
Response Contract Notes
firecrawl-searchandfirecrawl-scrapesuccess payloads are mapped to stable minimal schemas.- Missing mapped fields are preserved as
null(arrays remain arrays, and may be empty). - Both success and error responses are compact single-line JSON.
License Agreement
This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.