web-search-mcp

web-search-mcp

Provides web search and page fetch capabilities using a browser-based approach, enabling LLMs to search DuckDuckGo, Google, or Yandex and retrieve rendered HTML from URLs.

Category
Visit Server

README

Crawly-MCP

Browser-backed web search and page fetch for local LLMs, exposed as MCP tools and a CLI.

The design history is tracked in docs/IMPLEMENTATION_PLAN.md.

Naming

  • Python distribution: crawly-mcp
  • Import package: crawly_mcp
  • CLI executable: crawly-cli
  • MCP server executable: crawly-mcp

Tools

  • search(provider, context) runs a browser-backed search on duckduckgo (default), google, or yandex and returns up to 5 organic result URLs.
  • fetch(urls) fetches 1..5 URLs and returns browser-rendered HTML with per-URL pages, errors, and truncated fields.

context is intentionally the search query string for caller compatibility.

Setup

uv sync
chromium --version

For host usage, crawly defaults to launching a system Chromium binary. If Chromium is installed in a non-standard location, set:

PLAYWRIGHT_CHROMIUM_EXECUTABLE=/path/to/chromium

To force Playwright-managed Chromium instead of a host browser:

PLAYWRIGHT_BROWSER_SOURCE=bundled

Usage

Run the CLI directly:

uv run crawly-cli search --context "python async playwright"
uv run crawly-cli fetch https://example.com

Run the MCP server over stdio:

uv run crawly-mcp

Expose HTTP transport instead of stdio:

uv run crawly-mcp --transport streamable-http --host 127.0.0.1 --port 8000

The MCP server also reads:

  • CRAWLY_HOST
  • CRAWLY_PORT

Container

The container image uses Playwright-managed Chromium and defaults to HTTP MCP on port 8000.

Build locally:

docker build -t crawly-mcp:local .

Run locally:

docker run --rm --init -p 8000:8000 crawly-mcp:local

Override the transport to stdio:

docker run --rm --init -i crawly-mcp:local crawly-mcp --transport stdio

The container defaults to:

  • PLAYWRIGHT_BROWSER_SOURCE=bundled
  • CRAWLY_HOST=0.0.0.0
  • CRAWLY_PORT=8000

The HTTP MCP endpoint is unauthenticated in v1. Deploy it behind localhost, a private network, or an auth/TLS reverse proxy.

Published images are intended to be:

  • ghcr.io/<owner>/crawly-mcp
  • <dockerhub-namespace>/crawly-mcp

The first GHCR publish may need a one-time manual visibility change to make the package public.

Design Notes

  • One shared browser per process, with a fresh incognito context per request.
  • PLAYWRIGHT_BROWSER_SOURCE=system uses a host Chromium binary.
  • PLAYWRIGHT_BROWSER_SOURCE=bundled uses Playwright-managed Chromium.
  • Global navigation concurrency cap of 3.
  • Timeouts: 15s per page, 20s total for search, 35s total for fetch.
  • SSRF guard: http/https only, no embedded credentials, blocks loopback/private/link-local/reserved IPs before navigation and on browser subrequests.
  • JavaScript challenge pages get a bounded 10s settle window; there is no CAPTCHA solving, stealth fingerprinting, or site-specific bypass logic.
  • HTML is capped at 1 MiB per URL; oversized responses are truncated and reported in truncated.
  • robots.txt is not consulted in v1.

Development

source .venv/bin/activate
ruff check .
pytest

Smoke checks:

rg -n "web-search|web_search_mcp" README.md AGENTS.md CHANGELOG.md pyproject.toml src tests
.venv/bin/python scripts/http_mcp_smoke.py --url http://127.0.0.1:8000/mcp

Parser tests run against saved HTML fixtures; selector drift is an expected maintenance cost.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured