searxNcrawl
MCP server for web search and crawling, integrating SearXNG metasearch and Crawl4AI for privacy-respecting search and content extraction.
README
searxNcrawl
MCP server and CLI toolkit for web search and crawling, built on Crawl4AI and SearXNG.
Published at github.com/DasDigitaleMomentum/searxNcrawl — maintained by DDM – Das Digitale Momentum GmbH & Co KG. Successor to searxng-mcp.
Quick Start
Pick your setup:
Docker Compose (everything included)
SearXNG, Playwright, and the MCP server — one command.
cp .env.example .env # edit SEARXNG_URL if needed
docker compose up --build
➜ MCP server at http://localhost:9555/mcp
pip (standalone)
CLI tools, Python API, and MCP server. SearXNG required for search.
python -m venv .venv && source .venv/bin/activate
pip install -e .
playwright install chromium
uv (standalone)
Same capabilities as pip.
uv sync
uv run playwright install chromium
What you get
| Feature | Docker Compose | pip / uv |
|---|---|---|
| MCP Server (STDIO) | — | ✅ |
| MCP Server (HTTP) | ✅ | ✅ |
| Web Crawl | ✅ | ✅ |
| Web Search | ✅ (included) | ✅¹ |
| CLI Tools | via exec² |
✅ |
| Python API | — | ✅ |
| CORS (HTTP) | ✅ | ✅ |
¹ Requires a SearXNG instance. ² docker compose exec searxncrawl crawl ...
Features
Crawling
- Single page, multi-page, and site crawling (DFS with depth/page limits)
- Production-tested extraction config optimized for documentation sites
- Configurable timeouts with graceful error handling
Content Quality
- Markdown deduplication —
exact(default) removes repeated blocks,offdisables it - Link removal — strip all links for cleaner LLM context (
--remove-links) - Dedup guardrails — non-destructive metadata signals when removal is unusually aggressive
Web Search
- SearXNG metasearch integration (privacy-respecting)
- Configurable language, time range, categories, engines, safe search
MCP Server
- STDIO transport — for MCP harnesses (Zed, opencode, VS Code, Claude Code, etc.)
- HTTP transport — for remote access and browser clients
- CORS support — configurable origins for browser-based MCP clients
- Noise-free startup with UTF-8 encoding (cross-platform, incl. Windows)
CLI Tools
crawl— crawl pages from the command linesearch— search the web via SearXNGcrawl-capture— session capture for authenticated crawling
Installation
Docker Compose
The Compose stack includes searxNcrawl + SearXNG + Playwright/Chromium.
cp .env.example .env
# Edit .env: set SEARXNG_URL if using external SearXNG, or keep default
docker compose up --build
| Variable | Default | Description |
|---|---|---|
MCP_PORT |
9555 |
MCP server HTTP port |
The MCP server is available at http://localhost:9555/mcp.
pip
cd searxNcrawl
python -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install chromium
uv
cd searxNcrawl
uv sync
uv run playwright install chromium
SearXNG (search feature)
The search tool and CLI command require a SearXNG instance with JSON output enabled (search.formats in settings.yml). Docker Compose includes one automatically. For pip/uv, you need your own — self-hosting is recommended over public instances (rate limits).
Environment variables:
| Variable | Default | Description |
|---|---|---|
SEARXNG_URL |
http://localhost:8888 |
SearXNG instance URL |
SEARXNG_USERNAME |
(none) | Optional basic auth user |
SEARXNG_PASSWORD |
(none) | Optional basic auth pass |
Config file search order (CLI tools only):
./.env— current directory~/.config/searxncrawl/.env— user config
If no .env exists, .env.example is auto-copied to the user config path.
Usage
MCP Server
Start the server
# STDIO transport (for MCP harnesses)
python -m crawler.mcp_server
# HTTP transport
python -m crawler.mcp_server --transport http --port 8000
# HTTP with CORS
python -m crawler.mcp_server --transport http --cors-origins "http://localhost:3000"
# Docker (HTTP only)
docker compose up --build
MCP client configuration
Python with venv:
{
"mcpServers": {
"crawler": {
"command": "python",
"args": ["-m", "crawler.mcp_server"],
"cwd": "/path/to/searxNcrawl",
"env": { "SEARXNG_URL": "http://your-searxng:8888" }
}
}
}
With uv (no manual venv):
{
"mcpServers": {
"crawler": {
"command": "uv",
"args": ["run", "--directory", "/path/to/searxNcrawl", "python", "-m", "crawler.mcp_server"],
"env": { "SEARXNG_URL": "http://your-searxng:8888" }
}
}
}
Docker (HTTP endpoint):
{
"mcpServers": {
"crawler": {
"url": "http://localhost:9555/mcp"
}
}
}
CORS
HTTP transport can emit CORS headers for browser-based MCP clients:
crawl-mcp --transport http --cors-origins "http://localhost:3000,https://myapp.com"
crawl-mcp --transport http --cors-origins "*" # all origins — local dev only
Without --cors-origins, no CORS headers are sent (browsers will block cross-origin requests).
CLI Tools
After pip install -e . (or uv sync), the following commands are available:
# Crawl a page
crawl https://docs.example.com
# Site crawl with depth limit
crawl https://docs.example.com --site --max-depth 2 --max-pages 10 -o docs/
# Clean output (no links)
crawl https://example.com --remove-links
# Search
search "python tutorials"
search "Rezepte" --language de --max-results 5
# Session capture for authenticated crawling
crawl-capture --start-url https://example.com/login \
--completion-url 'https://example.com/dashboard.*' \
--output ./state.json
See Session Capture for the full crawl-capture guide.
Python API
from crawler import crawl_page, crawl_page_async, crawl_site, crawl_site_async
# Single page
doc = await crawl_page_async("https://docs.example.com/intro", dedup_mode="exact")
print(doc.markdown)
# Site crawl
result = crawl_site("https://docs.example.com", max_depth=2, max_pages=10)
for doc in result.documents:
print(f"{doc.status}: {doc.final_url}")
# Authenticated crawl
doc = await crawl_page_async(
"https://example.com/private",
auth={"storage_state": "/path/to/state.json"},
)
Reference
- MCP Tools — full parameter reference for
crawl,crawl_site,search - Output Formats — Markdown and JSON output structure, including
CrawledDocument - Session Capture — manual login flow and CDP session export
Configuration
Default config is optimized for documentation sites. Customize via overrides:
from crawler import build_markdown_run_config, RunConfigOverrides
config = build_markdown_run_config(
RunConfigOverrides(
delay_before_return_html=1.0,
mean_delay=1.0,
scan_full_page=True,
)
)
doc = await crawl_page_async("https://example.com", config=config)
Dependencies
crawl4ai>=0.7.4— crawler engineplaywright>=1.40.0— browser automationfastmcp>=2.0.0— MCP server frameworkhttpx>=0.27.0— HTTP client for SearXNGtldextract>=5.1.2— domain parsing for site crawls
License
MIT — © 2026 DDM – Das Digitale Momentum GmbH & Co KG
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.