websearch-mcp
A self-hosted MCP server that gives AI agents deep internet research capabilities — no API keys required, powered by SearxNG, Playwright, and Docker.
README
websearch-mcp
A self-hosted MCP server that gives AI agents deep internet research capabilities — no API keys required.
Powered by SearxNG (meta search engine), Playwright (web scraping), and Docker (Python sandbox).
Tools
| Tool | Description |
|---|---|
search_internet |
Meta search via SearxNG with speed/balanced/quality modes, domain dedup, and trust scoring |
fetch_page |
Scrape web pages with Playwright (stealth mode) + Readability extraction |
execute_python |
Run Python code in an isolated Docker sandbox (no network, resource limits) |
get_search_suggestions |
Get autocomplete suggestions for partial queries |
Prerequisites
- Node.js >= 20
- Docker (for SearxNG and Python sandbox)
- Playwright Chromium (auto-installed on first run)
Quick Start
1. Start SearxNG
docker compose up -d searxng
Wait for it to be healthy:
curl -sf http://localhost:8080/healthz
2. Install dependencies and build
npm install
npx playwright install chromium
npm run build
3. Run the server
node dist/index.js
The server communicates over stdio (MCP protocol). Connect it to any MCP-compatible client.
MCP Client Configuration
Claude Desktop / Claude Code
Add to your MCP settings:
{
"mcpServers": {
"websearch": {
"type": "stdio",
"command": "node",
"args": ["/absolute/path/to/websearch-mcp/dist/index.js"],
"env": {
"SEARXNG_URL": "http://localhost:8080"
}
}
}
}
Tool Reference
search_internet
Search the internet using SearxNG meta-search engine.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | required | Search query (1-500 chars) |
category |
enum | general |
general, news, science, images |
max_results |
int | 10 |
Max results (1-50) |
mode |
enum | balanced |
speed (1 page, 5 results), balanced (1 page), quality (3 pages, dedup) |
language |
string | "" |
Language code filter (e.g. en, es, fr) |
Features:
- Deduplication by exact URL
- Max 2 results per domain (ignoring www prefix)
- Trust scoring via
config/website-weight-config.json
fetch_page
Fetch and extract content from a web page.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to fetch (http/https only) |
extract_mode |
enum | readable |
snippet (meta description or first 200 chars), readable (Readability), full (raw HTML) |
timeout |
int | 10000 |
Page load timeout in ms (1000-30000) |
wait_for |
string | - | CSS selector to wait for before extracting |
Features:
- Stealth mode (custom user-agent, webdriver bypass)
- Auto-managed browser pool with idle timeout
- Content truncated to 50,000 characters
execute_python
Execute Python code in an isolated Docker container.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
code |
string | required | Python code to execute (max 100,000 chars) |
timeout |
int | 60000 |
Execution timeout in ms (1000-300000) |
memory_limit_mb |
int | 512 |
Memory limit in MB (128-4096) |
save_artifacts |
bool | false |
Save output files as artifacts |
Security:
- Network disabled during code execution
- All Linux capabilities dropped
- Resource limits enforced (RAM + CPU)
- Auto-detects and installs missing pip packages in a separate network-enabled stage
get_search_suggestions
Get autocomplete suggestions for a partial query.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | required | Partial query (min 2 chars) |
max_suggestions |
int | 5 |
Max suggestions (1-10) |
Falls back to simple query expansion (e.g. "query tutorial", "query examples") when SearxNG autocomplete is unavailable.
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
SEARXNG_URL |
http://localhost:8080 |
SearxNG instance URL |
SEARXNG_TIMEOUT_MS |
10000 |
Search request timeout |
SEARXNG_MAX_RETRIES |
3 |
Max retries for failed requests |
SEARXNG_RETRY_DELAY_MS |
1000 |
Base retry delay (exponential backoff) |
FETCH_TIMEOUT_MS |
10000 |
Page fetch timeout |
FETCH_MAX_CONTENT_LENGTH |
50000 |
Max content length for fetched pages |
PYTHON_SANDBOX_MEMORY_MB |
512 |
Default Python sandbox memory limit |
PYTHON_SANDBOX_CPU |
1 |
Default Python sandbox CPU limit |
PYTHON_SANDBOX_TIMEOUT_MS |
60000 |
Default Python sandbox timeout |
Trust Weights
config/website-weight-config.json defines domain trust scores from -1.0 to 1.0. Domains with negative scores are filtered out. Others are reordered by score (highest trust first).
{
"wikipedia.org": 1.0,
"arxiv.org": 0.95,
"github.com": 0.9,
"stackoverflow.com": 0.85,
"medium.com": 0.5,
"reddit.com": 0.3,
"twitter.com": 0.2
}
Development
# Build
npm run build
# Run in dev mode (auto-reload)
npm run dev
# Run tests
npm test
# Lint
npm run lint
# Type check
npm run typecheck
Project Structure
websearch-mcp/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── lib/
│ │ ├── browser-pool.ts # Playwright browser pool with stealth mode
│ │ ├── config.ts # Configuration loader
│ │ ├── docker-sandbox.ts # Docker-based Python sandbox
│ │ ├── searxng-client.ts # SearxNG API client with retry
│ │ └── trust-filter.ts # Domain trust scoring and filtering
│ └── tools/
│ ├── fetch.ts # fetch_page tool
│ ├── python.ts # execute_python tool
│ ├── search.ts # search_internet tool
│ └── suggestions.ts # get_search_suggestions tool
├── tests/
├── config/
│ └── website-weight-config.json
├── infra/
│ └── searxng/settings.yml # SearxNG configuration
├── docker-compose.yml # SearxNG container
├── package.json
├── tsconfig.json
└── README.md
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.