web-search-mcp
Provides web search and page fetch capabilities using a browser-based approach, enabling LLMs to search DuckDuckGo, Google, or Yandex and retrieve rendered HTML from URLs.
README
Crawly-MCP
Browser-backed web search and page fetch for local LLMs, exposed as MCP tools and a CLI.
The design history is tracked in docs/IMPLEMENTATION_PLAN.md.
Naming
- Python distribution:
crawly-mcp - Import package:
crawly_mcp - CLI executable:
crawly-cli - MCP server executable:
crawly-mcp
Tools
search(provider, context)runs a browser-backed search onduckduckgo(default),google, oryandexand returns up to 5 organic result URLs.fetch(urls)fetches1..5URLs and returns browser-rendered HTML with per-URLpages,errors, andtruncatedfields.
context is intentionally the search query string for caller compatibility.
Setup
uv sync
chromium --version
For host usage, crawly defaults to launching a system Chromium binary. If Chromium is installed in a non-standard location, set:
PLAYWRIGHT_CHROMIUM_EXECUTABLE=/path/to/chromium
To force Playwright-managed Chromium instead of a host browser:
PLAYWRIGHT_BROWSER_SOURCE=bundled
Usage
Run the CLI directly:
uv run crawly-cli search --context "python async playwright"
uv run crawly-cli fetch https://example.com
Run the MCP server over stdio:
uv run crawly-mcp
Expose HTTP transport instead of stdio:
uv run crawly-mcp --transport streamable-http --host 127.0.0.1 --port 8000
The MCP server also reads:
CRAWLY_HOSTCRAWLY_PORT
Container
The container image uses Playwright-managed Chromium and defaults to HTTP MCP on port 8000.
Build locally:
docker build -t crawly-mcp:local .
Run locally:
docker run --rm --init -p 8000:8000 crawly-mcp:local
Override the transport to stdio:
docker run --rm --init -i crawly-mcp:local crawly-mcp --transport stdio
The container defaults to:
PLAYWRIGHT_BROWSER_SOURCE=bundledCRAWLY_HOST=0.0.0.0CRAWLY_PORT=8000
The HTTP MCP endpoint is unauthenticated in v1. Deploy it behind localhost, a private network, or an auth/TLS reverse proxy.
Published images are intended to be:
ghcr.io/<owner>/crawly-mcp<dockerhub-namespace>/crawly-mcp
The first GHCR publish may need a one-time manual visibility change to make the package public.
Design Notes
- One shared browser per process, with a fresh incognito context per request.
PLAYWRIGHT_BROWSER_SOURCE=systemuses a host Chromium binary.PLAYWRIGHT_BROWSER_SOURCE=bundleduses Playwright-managed Chromium.- Global navigation concurrency cap of
3. - Timeouts:
15sper page,20stotal forsearch,35stotal forfetch. - SSRF guard:
http/httpsonly, no embedded credentials, blocks loopback/private/link-local/reserved IPs before navigation and on browser subrequests. - JavaScript challenge pages get a bounded
10ssettle window; there is no CAPTCHA solving, stealth fingerprinting, or site-specific bypass logic. - HTML is capped at
1 MiBper URL; oversized responses are truncated and reported intruncated. robots.txtis not consulted in v1.
Development
source .venv/bin/activate
ruff check .
pytest
Smoke checks:
rg -n "web-search|web_search_mcp" README.md AGENTS.md CHANGELOG.md pyproject.toml src tests
.venv/bin/python scripts/http_mcp_smoke.py --url http://127.0.0.1:8000/mcp
Parser tests run against saved HTML fixtures; selector drift is an expected maintenance cost.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.