web-browser-mcp
Enables AI agents to perform query-driven web searches and fetch page content via Bing and DuckDuckGo engines, with automatic fallback and no API keys needed.
README
web-browser-mcp
Query-driven web search + page fetch for AI agents. The agent gives it a natural-language question, gets back ranked results with snippets (and optionally full page content) ready to cite.
A small MCP server with two tools that compose cleanly:
| Tool | Takes | Returns |
|---|---|---|
web_search(query, ...) |
A search query in plain English | Ranked results from real search engines, with title, URL, snippet, and (optionally) full extracted page content |
get_page_content(url, ...) |
A URL the agent already knows | Cleaned main text of that page |
Multi-engine fallback: Bing via headless Chromium → DuckDuckGo HTML via httpx. No API keys required.
This is the shape you want when the agent is answering a question like "what's the latest model context protocol spec?" or "find a good tutorial on asyncio". The agent doesn't need to know which engine to use, what URL to fetch, or how to render the SERP — it just calls web_search and gets back ready-to-inject context.
Install
git clone https://github.com/ptrken01/web-browser-mcp
cd web-browser-mcp
uv venv .venv --python 3.11
uv pip install --python .venv/bin/python -e ".[dev]"
.venv/bin/playwright install chromium
(If you have an existing venv from elsewhere, just uv pip install -e . in it
and run playwright install chromium once. The .[dev] extra adds pytest,
mypy, and ruff.)
Run
stdio (Claude Desktop, Cursor)
.venv/bin/web-browser-mcp
In your client's MCP config:
{
"mcpServers": {
"web-search": {
"command": "/absolute/path/to/web-browser-mcp/.venv/bin/web-browser-mcp"
}
}
}
streamable-http (llama-ui, Open WebUI, browser clients)
.venv/bin/web-browser-mcp --transport streamable-http
# Default: http://127.0.0.1:8766/mcp
In llama-ui's MCP server settings, add an HTTP transport pointing at
http://127.0.0.1:8766/mcp. CORS is open for localhost / 127.0.0.1 by
default.
Tools
web_search(query, limit=5, include_content=False, ...)
Give it a natural-language question or topic. Returns ranked results with title, URL, and snippet, ready to be cited in your answer.
{
"name": "web_search",
"arguments": {
"query": "model context protocol specification",
"limit": 5,
"include_content": false
}
}
Parameters:
| Param | Type | Default | Notes |
|---|---|---|---|
query |
str |
— | Required. Natural-language search query (1-2000 chars). |
limit |
int |
5 |
Max results (1-10). |
include_content |
bool |
False |
When True, follows each result URL and extracts the main text of the page. Adds latency. |
engine_order |
list[str] |
["bing", "duckduckgo"] |
Override the engine priority. Subset of ["bing", "duckduckgo"]. |
timeout_s |
float |
15 |
Per-engine timeout in seconds. |
Response:
{
"query": "model context protocol specification",
"engine": "duckduckgo",
"count": 5,
"results": [
{
"title": "Official site",
"url": "https://modelcontextprotocol.io",
"snippet": "Model Context Protocol",
"engine": "duckduckgo"
},
{
"title": "What is the Model Context Protocol (MCP)?",
"url": "https://modelcontextprotocol.io/docs/getting-started/intro",
"snippet": "MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems...",
"engine": "duckduckgo"
}
],
"duration_s": 1.234
}
When include_content=True, each result additionally has:
{
"content": "Main text extracted from the page...",
"content_chars": 3421
}
get_page_content(url, max_chars=10000)
For the "I have a specific URL and want its content" case.
{
"name": "get_page_content",
"arguments": {
"url": "https://example.com/article",
"max_chars": 5000
}
}
Response:
{
"url": "https://example.com/article",
"final_url": "https://example.com/article",
"title": "Example Article",
"text": "Main content of the page...",
"text_chars": 3421,
"duration_s": 0.5
}
Engines and fallback
The tool tries engines in engine_order. The first one to return ≥1 result
wins. If all engines fail, the response has an error field with a stable
string code.
| Engine | How | Strengths | Weaknesses |
|---|---|---|---|
| Bing | Playwright headless Chromium | Full SERP, rich snippets, related questions | Can hit captcha on shared IPs; slower |
| DuckDuckGo HTML | httpx (no JS) | Reliable, fast, no browser overhead | Sometimes rate-limited under heavy use |
In practice, DDG HTML is the workhorse — it's the engine that succeeds most often in test runs. Bing is the upgrade path for richer SERP data when the agent has a fresh IP and the captcha doesn't trip.
Error handling
All errors come back as structured error fields, not exceptions:
| Error code | When |
|---|---|
invalid_query |
Empty query or query too long (>2000 chars). |
invalid_url |
get_page_content got a non-http URL or URL with no host. |
search_engine_error |
The active engine returned an error. Try a different engine_order. |
search_timeout |
The active engine timed out. |
browser_not_initialized |
An engine that needs Playwright was called without a browser. |
fetch_failed |
get_page_content got a non-2xx response or connection error. |
extraction_failed |
get_page_content got a response but trafilatura couldn't extract text. |
lifespan did not initialize |
Server lifespan never ran — see the searxng-mcp-scraper pitfall. Should not happen with this server. |
Configuration (env vars)
| Env var | Default | Notes |
|---|---|---|
HEADLESS |
True |
Set to False to see the browser. |
BROWSER |
chromium |
chromium / firefox / webkit. |
BROWSER_TIMEOUT_S |
30.0 |
Default per-step timeout. |
NAVIGATION_TIMEOUT_S |
30.0 |
Default web_search engine timeout. |
USER_AGENT |
(Chrome UA) | Override if you get blocked. |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR. |
MCP_HOST |
127.0.0.1 |
Bind host (streamable-http transport). |
MCP_PORT |
8766 |
Bind port (streamable-http transport). |
MCP_CORS_ORIGINS |
localhost,127.0.0.1 |
Comma-separated CORS allow-list. |
Test
.venv/bin/python -m pytest
14 tests pass. The end-to-end test (tests/test_server_e2e.py) boots
real uvicorn + real FastMCP + a real Playwright browser, performs the MCP
initialize handshake, and calls web_search against real search
engines. This is the test that catches lifespan-bug classes from
searxng-mcp-scraper — unit tests that call tool functions directly with
hand-built state would miss them.
How it fits with the rest of the toolkit
| Tool | Source | When to use |
|---|---|---|
web-browser-mcp.web_search |
Bing / DDG (this repo) | Agent has a question, needs relevant web results. |
web-browser-mcp.get_page_content |
httpx + trafilatura | Agent has a specific URL, wants its content. |
searxng-mcp-scraper.search |
SearXNG metasearch | When you want to control the engines, categories, language. |
searxng-mcp-scraper.fetch |
trafilatura over HTTP | Fast static-HTML extraction. |
searxng-mcp-scraper.scrape_blog |
RSS + parallel fetch | Whole blog → one .md. |
searxng-mcp-scraper.deep_scrape |
scrape_blog + docs | Blog + linked PDFs / docs. |
Use web_search first for general questions. Drop to get_page_content or
searxng-mcp-scraper.fetch when you have a URL. Use scrape_blog for
"read this whole blog" use cases.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.