searxNcrawl

searxNcrawl

MCP server for web search and crawling, integrating SearXNG metasearch and Crawl4AI for privacy-respecting search and content extraction.

Category
Visit Server

README

searxNcrawl

MCP server and CLI toolkit for web search and crawling, built on Crawl4AI and SearXNG.

Published at github.com/DasDigitaleMomentum/searxNcrawl — maintained by DDM – Das Digitale Momentum GmbH & Co KG. Successor to searxng-mcp.

Quick Start

Pick your setup:

Docker Compose (everything included)

SearXNG, Playwright, and the MCP server — one command.

cp .env.example .env          # edit SEARXNG_URL if needed
docker compose up --build

➜ MCP server at http://localhost:9555/mcp

pip (standalone)

CLI tools, Python API, and MCP server. SearXNG required for search.

python -m venv .venv && source .venv/bin/activate
pip install -e .
playwright install chromium

uv (standalone)

Same capabilities as pip.

uv sync
uv run playwright install chromium

What you get

Feature Docker Compose pip / uv
MCP Server (STDIO)
MCP Server (HTTP)
Web Crawl
Web Search ✅ (included) ✅¹
CLI Tools via exec²
Python API
CORS (HTTP)

¹ Requires a SearXNG instance. ² docker compose exec searxncrawl crawl ...

Features

Crawling

  • Single page, multi-page, and site crawling (DFS with depth/page limits)
  • Production-tested extraction config optimized for documentation sites
  • Configurable timeouts with graceful error handling

Content Quality

  • Markdown deduplicationexact (default) removes repeated blocks, off disables it
  • Link removal — strip all links for cleaner LLM context (--remove-links)
  • Dedup guardrails — non-destructive metadata signals when removal is unusually aggressive

Web Search

  • SearXNG metasearch integration (privacy-respecting)
  • Configurable language, time range, categories, engines, safe search

MCP Server

  • STDIO transport — for MCP harnesses (Zed, opencode, VS Code, Claude Code, etc.)
  • HTTP transport — for remote access and browser clients
  • CORS support — configurable origins for browser-based MCP clients
  • Noise-free startup with UTF-8 encoding (cross-platform, incl. Windows)

CLI Tools

  • crawl — crawl pages from the command line
  • search — search the web via SearXNG
  • crawl-capture — session capture for authenticated crawling

Installation

Docker Compose

The Compose stack includes searxNcrawl + SearXNG + Playwright/Chromium.

cp .env.example .env
# Edit .env: set SEARXNG_URL if using external SearXNG, or keep default
docker compose up --build
Variable Default Description
MCP_PORT 9555 MCP server HTTP port

The MCP server is available at http://localhost:9555/mcp.

pip

cd searxNcrawl
python -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install chromium

uv

cd searxNcrawl
uv sync
uv run playwright install chromium

SearXNG (search feature)

The search tool and CLI command require a SearXNG instance with JSON output enabled (search.formats in settings.yml). Docker Compose includes one automatically. For pip/uv, you need your own — self-hosting is recommended over public instances (rate limits).

Environment variables:

Variable Default Description
SEARXNG_URL http://localhost:8888 SearXNG instance URL
SEARXNG_USERNAME (none) Optional basic auth user
SEARXNG_PASSWORD (none) Optional basic auth pass

Config file search order (CLI tools only):

  1. ./.env — current directory
  2. ~/.config/searxncrawl/.env — user config

If no .env exists, .env.example is auto-copied to the user config path.

Usage

MCP Server

Start the server

# STDIO transport (for MCP harnesses)
python -m crawler.mcp_server

# HTTP transport
python -m crawler.mcp_server --transport http --port 8000

# HTTP with CORS
python -m crawler.mcp_server --transport http --cors-origins "http://localhost:3000"

# Docker (HTTP only)
docker compose up --build

MCP client configuration

Python with venv:

{
  "mcpServers": {
    "crawler": {
      "command": "python",
      "args": ["-m", "crawler.mcp_server"],
      "cwd": "/path/to/searxNcrawl",
      "env": { "SEARXNG_URL": "http://your-searxng:8888" }
    }
  }
}

With uv (no manual venv):

{
  "mcpServers": {
    "crawler": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/searxNcrawl", "python", "-m", "crawler.mcp_server"],
      "env": { "SEARXNG_URL": "http://your-searxng:8888" }
    }
  }
}

Docker (HTTP endpoint):

{
  "mcpServers": {
    "crawler": {
      "url": "http://localhost:9555/mcp"
    }
  }
}

CORS

HTTP transport can emit CORS headers for browser-based MCP clients:

crawl-mcp --transport http --cors-origins "http://localhost:3000,https://myapp.com"
crawl-mcp --transport http --cors-origins "*"   # all origins — local dev only

Without --cors-origins, no CORS headers are sent (browsers will block cross-origin requests).

CLI Tools

After pip install -e . (or uv sync), the following commands are available:

# Crawl a page
crawl https://docs.example.com

# Site crawl with depth limit
crawl https://docs.example.com --site --max-depth 2 --max-pages 10 -o docs/

# Clean output (no links)
crawl https://example.com --remove-links

# Search
search "python tutorials"
search "Rezepte" --language de --max-results 5

# Session capture for authenticated crawling
crawl-capture --start-url https://example.com/login \
    --completion-url 'https://example.com/dashboard.*' \
    --output ./state.json

See Session Capture for the full crawl-capture guide.

Python API

from crawler import crawl_page, crawl_page_async, crawl_site, crawl_site_async

# Single page
doc = await crawl_page_async("https://docs.example.com/intro", dedup_mode="exact")
print(doc.markdown)

# Site crawl
result = crawl_site("https://docs.example.com", max_depth=2, max_pages=10)
for doc in result.documents:
    print(f"{doc.status}: {doc.final_url}")

# Authenticated crawl
doc = await crawl_page_async(
    "https://example.com/private",
    auth={"storage_state": "/path/to/state.json"},
)

Reference

  • MCP Tools — full parameter reference for crawl, crawl_site, search
  • Output Formats — Markdown and JSON output structure, including CrawledDocument
  • Session Capture — manual login flow and CDP session export

Configuration

Default config is optimized for documentation sites. Customize via overrides:

from crawler import build_markdown_run_config, RunConfigOverrides

config = build_markdown_run_config(
    RunConfigOverrides(
        delay_before_return_html=1.0,
        mean_delay=1.0,
        scan_full_page=True,
    )
)
doc = await crawl_page_async("https://example.com", config=config)

Dependencies

  • crawl4ai>=0.7.4 — crawler engine
  • playwright>=1.40.0 — browser automation
  • fastmcp>=2.0.0 — MCP server framework
  • httpx>=0.27.0 — HTTP client for SearXNG
  • tldextract>=5.1.2 — domain parsing for site crawls

License

MIT — © 2026 DDM – Das Digitale Momentum GmbH & Co KG

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured