Web Search MCP

Web Search MCP

A high-performance search service that converts results from Google, Bing, and DuckDuckGo into structured JSON or Markdown. It features multi-layer depth crawling and uses the Camoufox anti-detection browser for reliable content extraction and fallback search logic.

Category
Visit Server

README

Web Search MCP

基于 Camoufox + FastAPI 的高性能 Web 搜索服务,将搜索引擎结果转换为结构化 JSON / Markdown 输出。支持多层深度抓取与并发执行。

功能特性

  • 三大搜索引擎:Google、Bing、DuckDuckGo
  • 多层深度抓取:SERP 解析 → 正文提取 → 外链抓取
  • 双格式输出:JSON / Markdown
  • 反检测浏览器:Camoufox 真实浏览器指纹(geoip、humanize、locale)
  • 并发执行:浏览器池 + asyncio 信号量控制
  • 引擎自动回退:主引擎无结果时自动切换备选引擎

搜索深度

depth 行为 说明
1 SERP 解析 默认。提取标题、链接、摘要
2 SERP + 正文 进入每个结果链接,提取页面正文
3 SERP + 正文 + 外链 继续抓取正文中的外部链接内容

快速开始

安装

# 克隆项目
git clone <repo-url> && cd web-search-mcp

# 安装依赖
pip install -e ".[dev]"

# 安装 Camoufox 浏览器
python -m camoufox fetch

启动服务

# 开发模式(自动重载)
uvicorn src.main:app --reload --port 8000

# 生产模式
uvicorn src.main:app --host 0.0.0.0 --port 8000

服务启动后访问 http://localhost:8000/health 确认状态:

curl http://localhost:8000/health
# {"status":"ok","pool_ready":true}

API 使用

GET /search

# 基础搜索(默认 Google,depth=1,JSON 格式)
curl 'http://localhost:8000/search?q=python+asyncio'

# 指定引擎 + 深度
curl 'http://localhost:8000/search?q=firsh.me+blog&engine=duckduckgo&depth=2&max_results=3'

# Markdown 格式输出
curl 'http://localhost:8000/search?q=firsh.me+blog&engine=duckduckgo&format=markdown'

# Bing 搜索
curl 'http://localhost:8000/search?q=fastapi+tutorial&engine=bing&max_results=5'

# 三层深度抓取(SERP + 正文 + 外链)
curl 'http://localhost:8000/search?q=web+scraping&engine=duckduckgo&depth=3&max_results=3'

参数说明:

参数 类型 默认值 说明
q string 必填 搜索关键词(1-500 字符)
engine string google 搜索引擎:google / bing / duckduckgo
depth int 1 抓取深度:1-3
format string json 输出格式:json / markdown
max_results int 10 最大结果数(1-50)
timeout int 30 超时秒数(5-120)

POST /search

curl -X POST http://localhost:8000/search \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "firsh.me blog",
    "engine": "duckduckgo",
    "depth": 2,
    "format": "json",
    "max_results": 5,
    "timeout": 30
  }'

响应示例

JSON 格式(depth=1):

{
  "query": "firsh.me blog",
  "engine": "duckduckgo",
  "depth": 1,
  "total": 3,
  "results": [
    {
      "title": "NeoJ's Web Page [下水鱼的Blog]",
      "url": "https://firsh.me/",
      "snippet": "这是一个关于下水鱼的个人网站的博客页面。",
      "content": "",
      "sub_links": []
    }
  ],
  "metadata": {
    "elapsed_ms": 2824,
    "timestamp": "2026-02-10T17:02:20.891421+00:00",
    "engine": "duckduckgo",
    "depth": 1
  }
}

JSON 格式(depth=2,包含正文内容):

{
  "query": "firsh.me blog",
  "engine": "duckduckgo",
  "depth": 2,
  "total": 3,
  "results": [
    {
      "title": "NeoJ's Web Page [下水鱼的Blog]",
      "url": "https://firsh.me/",
      "snippet": "这是一个关于下水鱼的个人网站的博客页面。",
      "content": "blog/2026\n2026-02-02\n关闭Chrome 自动更新...",
      "sub_links": []
    }
  ],
  "metadata": {
    "elapsed_ms": 5224,
    "timestamp": "2026-02-10T17:06:36.644148+00:00",
    "engine": "duckduckgo",
    "depth": 2
  }
}

Markdown 格式:

# Search Results: firsh.me blog

**Engine:** duckduckgo | **Depth:** 1 | **Results:** 3
**Time:** 1792ms

---

## 1. NeoJ's Web Page [下水鱼的Blog]
**URL:** https://firsh.me/

> 这是一个关于下水鱼的个人网站的博客页面。

引擎状态

引擎 状态 说明
DuckDuckGo 稳定可用 推荐使用,搜索质量高,无地域限制
Google 受限 部分 IP 会触发验证码,自动回退到 DuckDuckGo
Bing 可用 使用 global.bing.com 避免地域重定向,部分 IP 结果相关性较低

Google 被拦截时会自动按 DuckDuckGo → Bing 顺序回退,响应中的 engine 字段标识实际使用的引擎。

MCP 模式使用(curl 调用示例)

MCP 服务默认监听 http://127.0.0.1:8897,使用 Streamable HTTP 传输协议。

启动 MCP 服务

# 本地启动(HTTP 模式)
python -m src.mcp_server --transport http --host 127.0.0.1 --port 8897

# Docker 启动
docker compose up -d

初始化 MCP 会话

# 发送 initialize 请求
curl -s -X POST http://127.0.0.1:8897/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-03-26",
      "capabilities": {},
      "clientInfo": {"name": "curl-demo", "version": "1.0"}
    }
  }' | jq .

调用 web_search 工具

# 搜索(depth=1,快速 SERP 结果)
curl -s -X POST http://127.0.0.1:8897/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "web_search",
      "arguments": {
        "query": "firsh.me",
        "engine": "duckduckgo",
        "max_results": 5,
        "depth": 1
      }
    }
  }' | jq .

# 搜索(depth=2,包含页面正文内容)
curl -s -X POST http://127.0.0.1:8897/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "web_search",
      "arguments": {
        "query": "python asyncio tutorial",
        "engine": "google",
        "max_results": 3,
        "depth": 2
      }
    }
  }' | jq .

调用 get_page_content 工具

# 获取单个页面内容
curl -s -X POST http://127.0.0.1:8897/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{
    "jsonrpc": "2.0",
    "id": 4,
    "method": "tools/call",
    "params": {
      "name": "get_page_content",
      "arguments": {
        "url": "https://firsh.me/"
      }
    }
  }' | jq .

列出可用搜索引擎

curl -s -X POST http://127.0.0.1:8897/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -d '{
    "jsonrpc": "2.0",
    "id": 5,
    "method": "tools/call",
    "params": {
      "name": "list_search_engines",
      "arguments": {}
    }
  }' | jq .

带 API Key 认证

# 如果配置了 API Key 认证,在请求头中添加 Authorization
curl -s -X POST http://127.0.0.1:8897/mcp \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json, text/event-stream' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "web_search",
      "arguments": {"query": "hello world", "engine": "duckduckgo"}
    }
  }' | jq .

Camoufox 指纹浏览器配置

通过环境变量配置 Camoufox 高级功能:

环境变量 说明 示例
BROWSER_POOL_SIZE 浏览器并发数 5
BROWSER_PROXY 代理服务器 socks5://127.0.0.1:1080
BROWSER_OS 目标 OS 指纹 windows / macos / linux
BROWSER_FONTS 自定义字体列表 Arial,Helvetica,Times New Roman
BROWSER_BLOCK_WEBGL 阻止 WebGL 指纹 true
BROWSER_ADDONS Firefox 插件路径 /path/to/addon1.xpi,/path/to/addon2.xpi

内置功能(默认启用):

  • GeoIP 伪装 — 基于真实 IP 自动匹配地理位置指纹
  • 人性化操作 — 模拟真实鼠标移动和点击行为
  • 图片阻止 — 加速页面加载
  • Locale 匹配 — 浏览器语言与地区一致

测试

# 单元测试(26 个测试)
pytest tests/ -v

# 集成测试(自动启动服务,真实搜索)
python scripts/test_live.py

# 集成测试 - 自定义参数
python scripts/test_live.py --query "python asyncio" --engines duckduckgo --max-depth 2

# 集成测试 - 服务已在运行时
python scripts/test_live.py --no-server --engines duckduckgo google --max-depth 3

项目结构

web-search-mcp/
├── src/
│   ├── main.py                 # FastAPI 入口 + 浏览器池生命周期
│   ├── config.py               # 配置管理(BrowserConfig / AppConfig)
│   ├── api/
│   │   ├── routes.py           # API 路由 + 引擎回退逻辑
│   │   └── schemas.py          # Pydantic 请求/响应模型
│   ├── engine/
│   │   ├── base.py             # 搜索引擎抽象基类
│   │   ├── google.py           # Google(含首页预热 + 验证码检测)
│   │   ├── bing.py             # Bing(global.bing.com + URL 解码)
│   │   └── duckduckgo.py       # DuckDuckGo
│   ├── scraper/
│   │   ├── browser.py          # Camoufox 浏览器池
│   │   ├── parser.py           # HTML 内容解析
│   │   └── depth.py            # 多层深度抓取调度
│   └── formatter/
│       ├── json_fmt.py         # JSON 格式化
│       └── markdown_fmt.py     # Markdown 格式化
├── tests/                      # 单元测试
├── scripts/
│   └── test_live.py            # 集成测试脚本
└── pyproject.toml

技术栈

组件 技术
Web 框架 FastAPI + Uvicorn
浏览器引擎 Camoufox(反检测 Firefox,Playwright 驱动)
异步运行时 asyncio + Semaphore 并发控制
HTML 解析 BeautifulSoup4 + lxml
内容转换 markdownify(HTML → Markdown)
数据校验 Pydantic v2

Claude Code

img_1.png

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured