MCP Servers

web-search-mcp

Provides web search and page fetch capabilities using a browser-based approach, enabling LLMs to search DuckDuckGo, Google, or Yandex and retrieve rendered HTML from URLs.

README

Crawly-MCP

Browser-backed web search and page fetch for local LLMs, exposed as MCP tools and a CLI.

The design history is tracked in docs/IMPLEMENTATION_PLAN.md.

Naming

Python distribution: crawly-mcp
Import package: crawly_mcp
CLI executable: crawly-cli
MCP server executable: crawly-mcp

Tools

search(provider, context) runs a browser-backed search on duckduckgo (default), google, or yandex and returns up to 5 organic result URLs.
fetch(urls) fetches 1..5 URLs and returns browser-rendered HTML with per-URL pages, errors, and truncated fields.

context is intentionally the search query string for caller compatibility.

Setup

uv sync
chromium --version

For host usage, crawly defaults to launching a system Chromium binary. If Chromium is installed in a non-standard location, set:

PLAYWRIGHT_CHROMIUM_EXECUTABLE=/path/to/chromium

To force Playwright-managed Chromium instead of a host browser:

PLAYWRIGHT_BROWSER_SOURCE=bundled

Usage

Run the CLI directly:

uv run crawly-cli search --context "python async playwright"
uv run crawly-cli fetch https://example.com

Run the MCP server over stdio:

uv run crawly-mcp

Expose HTTP transport instead of stdio:

uv run crawly-mcp --transport streamable-http --host 127.0.0.1 --port 8000

The MCP server also reads:

CRAWLY_HOST
CRAWLY_PORT

Container

The container image uses Playwright-managed Chromium and defaults to HTTP MCP on port 8000.

Build locally:

docker build -t crawly-mcp:local .

Run locally:

docker run --rm --init -p 8000:8000 crawly-mcp:local

Override the transport to stdio:

docker run --rm --init -i crawly-mcp:local crawly-mcp --transport stdio

The container defaults to:

PLAYWRIGHT_BROWSER_SOURCE=bundled
CRAWLY_HOST=0.0.0.0
CRAWLY_PORT=8000

The HTTP MCP endpoint is unauthenticated in v1. Deploy it behind localhost, a private network, or an auth/TLS reverse proxy.

Published images are intended to be:

ghcr.io/<owner>/crawly-mcp
<dockerhub-namespace>/crawly-mcp

The first GHCR publish may need a one-time manual visibility change to make the package public.

Design Notes

One shared browser per process, with a fresh incognito context per request.
PLAYWRIGHT_BROWSER_SOURCE=system uses a host Chromium binary.
PLAYWRIGHT_BROWSER_SOURCE=bundled uses Playwright-managed Chromium.
Global navigation concurrency cap of 3.
Timeouts: 15s per page, 20s total for search, 35s total for fetch.
SSRF guard: http/https only, no embedded credentials, blocks loopback/private/link-local/reserved IPs before navigation and on browser subrequests.
JavaScript challenge pages get a bounded 10s settle window; there is no CAPTCHA solving, stealth fingerprinting, or site-specific bypass logic.
HTML is capped at 1 MiB per URL; oversized responses are truncated and reported in truncated.
robots.txt is not consulted in v1.

Development

source .venv/bin/activate
ruff check .
pytest

Smoke checks:

rg -n "web-search|web_search_mcp" README.md AGENTS.md CHANGELOG.md pyproject.toml src tests
.venv/bin/python scripts/http_mcp_smoke.py --url http://127.0.0.1:8000/mcp

Parser tests run against saved HTML fixtures; selector drift is an expected maintenance cost.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured