MCP Servers

Vertical Search MCP

A scalable MCP server for searching multiple platforms (WeChat, Zhihu, Google) with async API, content fetching, compression, and real URL resolution.

README

Vertical Search MCP

中文文档 | English

A scalable vertical search MCP (Model Context Protocol) server supporting multiple platforms (WeChat, Zhihu, etc.).

Features

Multi-platform support: Default support for WeChat, Google Custom Search, and Zhihu, easily extensible to other platforms
High performance: Browser reuse mechanism, 5x speed improvement
Real URL resolution: Automatically resolves redirect links to get final destination URLs (e.g., mp.weixin.qq.com for WeChat, zhihu.com for Zhihu)
Content fetching and compression: Fetch full article content with intelligent compression using DeepSeek API
High extensibility: New platform integration takes only 1-2 hours
Production-ready stability: Comprehensive error handling and monitoring

Architecture

graph TD
    Client["MCP Client<br/>(Claude)"] -->|"JSON-RPC 2.0"| Server["MCP Server<br/>(mcp_server.py)"]
    Server --> Manager["UnifiedSearchManager<br/>- Platform routing<br/>- Browser pool management<br/>- Cache management"]
    Manager --> Weixin["WeixinSearcher"]
    Manager --> Google["GoogleSearcher"]
    Manager --> Zhihu["ZhihuSearcher"]
    Weixin --> BrowserPool["Playwright Browser Pool<br/>(Persistent)"]
    Google --> BrowserPool
    Zhihu --> BrowserPool

Requirements

Python 3.12+
Playwright 1.40.0+
PyYAML 6.0+
OpenAI SDK 1.0.0+ (for DeepSeek API compatibility)

Installation

Clone the repository:

git clone <repository-url>
cd vertical-search-mcp

Create a virtual environment:

python3.12 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt
pip install -r requirements-dev.txt

Install Playwright Chromium:

playwright install chromium

Usage

MCP Server

The MCP server provides async search tools (start_vertical_search and get_search_status) that can be called from AI clients supporting the MCP protocol (e.g., Claude Desktop).

Configuration

Configure the MCP server in your AI client's settings file. For example, in Claude Desktop:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "vertical-search": {
      "command": "/path/to/vertical-search-mcp/.venv/bin/python",
      "args": ["/path/to/vertical-search-mcp/mcp_server.py"],
      "env": {
        "APIKEY_DEEPSEEK": "your-deepseek-api-key-here",
        "APIKEY_GOOGLE_CUSTOM_SEARCH": "your-google-api-key-here",
        "APIKEY_GOOGLE_SEARCH_ID": "your-google-search-engine-id-here"
      }
    }
  }
}

Important Notes:

Replace /path/to/vertical-search-mcp with the actual path to your project directory
Use the Python interpreter from your virtual environment (.venv/bin/python)
If you're using a system-wide Python, you can use python3 or the full path to your Python interpreter
API Keys:
- Content Compression (APIKEY_DEEPSEEK): Optional, for intelligent content compression
  - Get your API key from: https://platform.deepseek.com/
  - When API key is required: For long articles (exceeding 3000 tokens), the system uses DeepSeek API to intelligently compress content while preserving key information. Without the API key, long articles will be truncated, potentially losing important content.
  - When API key is optional: For short articles (under 3000 tokens), compression is not needed, so the API key is not required.
- Google Custom Search (APIKEY_GOOGLE_CUSTOM_SEARCH, APIKEY_GOOGLE_SEARCH_ID): Optional, for Google search platform
  - Get your API key and Search Engine ID from: https://developers.google.com/custom-search/v1/overview
  - When credentials are provided: Google platform will be automatically registered and available for search
  - When credentials are missing: Google platform will not be available, but other platforms (WeChat) will continue to work normally
After updating the config, restart your AI client (e.g., Claude Desktop)

Tools: Async Search API

The MCP server provides async search tools that support long-running searches without timeout issues. All searches use the unified async mode.

Tool 1: start_vertical_search

Start an async search task. Returns task_id immediately (< 1 second), allowing the search to run in the background.

Parameters:

platform (optional): Platform(s) to search. Supports:
- "all" (default): Search all registered platforms
- Single platform: "weixin", "google", or "zhihu"
- Multiple platforms: "weixin,google" (comma-separated)
- Spaces are handled: "weixin, google" works the same
query (required): Search query string (1-100 characters)
max_results (optional): Maximum number of results (total across all platforms, default: 10, maximum: sum of all platform limits)
include_content (optional): Whether to include full article content (default: true)

Response:

If task completes quickly (< 1 second): Returns results directly with status: "completed"
Otherwise: Returns task_id and status: "started" for polling

Tool 2: get_search_status

Get the status and results of an async search task. Use this to poll for progress and retrieve results when ready.

Parameters:

task_id (required): Task ID from start_vertical_search

Response:

status: "running": Task is still executing, includes progress information
status: "completed": Task finished, includes content with results
status: "failed": Task failed, includes error message
status: "not_found": Task expired or doesn't exist (tasks expire after 30 minutes)

Tool 3: cancel_search (optional)

Cancel a running search task.

Parameters:

task_id (required): Task ID to cancel

Note on include_content parameter:

When include_content=true (default): Fetches full article content and intelligently compresses it to stay within token limits
- Short articles (≤3000 tokens): No compression needed, works without APIKEY_DEEPSEEK
- Long articles (>3000 tokens): DeepSeek API key recommended
  - With API key (APIKEY_DEEPSEEK): Uses DeepSeek to compress while preserving key information, conclusions, and important details
  - Without API key: Falls back to safe truncation strategy, potentially losing tail content
When include_content=false: Returns only titles, URLs, snippets

Multi-Platform Search:

Use platform="all" or omit the parameter to search all registered platforms
Use platform="weixin,google" to search specific platforms
Results are automatically deduplicated by URL
Each result includes a platform field indicating its source
Progress reports show platform-level information (e.g., "Platform 1/2 (weixin): Searching...")

Example Usage in Claude:

Search for Python articles on WeChat from the last week, limit to 5 results.
Search for Python articles across all platforms, limit to 10 results.
Search for Python articles on WeChat and Google, limit to 15 results.

IMPORTANT: Polling Required The AI MUST follow this workflow:

Call start_vertical_search to start the search
Repeatedly call get_search_status every 10-15 seconds until:
- status == "completed" → Return results to user
- status == "failed" → Report error to user
- Do NOT stop polling until one of these final states is reached
Display progress updates to the user while polling
Return results when the task completes

Example AI Workflow:

# Step 1: Start search
response = start_vertical_search(platform="weixin", query="Python", max_results=5)
task_id = response["task_id"]

# Step 2: Poll until complete (IMPORTANT: Keep polling!)
while True:
    status = get_search_status(task_id=task_id)
    
    if status["status"] == "completed":
        # Got results, return to user
        return status["content"]
    elif status["status"] == "failed":
        # Task failed, report error
        return f"Search failed: {status['error']}"
    else:
        # Still running, show progress and wait
        print(f"Progress: {status['progress']['percentage']}%")
        # Wait 10-15 seconds before next poll
        await sleep(12)

Fast Completion Detection:

If a task completes in < 1 second, start_vertical_search returns results directly
This avoids unnecessary polling for quick searches
For longer searches, use get_search_status to poll for results

Progress Updates: The search progress is reported through get_search_status with stages:

searching: Finding articles on the platform
fetching_content: Downloading article content
compressing: Compressing content to fit token limits

For multi-platform searches, progress messages include platform context:

"Platform 1/2 (weixin): Searching..." - Shows which platform is being searched
"Platform 1/2 (weixin): Completed (5 results)" - Shows platform completion
"Multi-platform search completed: 2/2 platforms, 10 total results" - Final summary

Example Response (Single Platform):

Found 5 result(s) for 'Python' on WeChat:

1. **Python变量命名规范详解**
   Source: 微信公众号
   Date: 2024-01-15
   Summary: 本文详细介绍了Python变量命名的最佳实践...
   Link: https://mp.weixin.qq.com/s?src=11&timestamp=...

2. **Python异步编程指南**
   Source: 微信公众号
   Date: 2024-01-14
   Summary: 深入理解Python的asyncio模块...
   Link: https://mp.weixin.qq.com/s?src=11&timestamp=...

...

Example Response (Multi-Platform):

Found 10 result(s) for 'Python' across WeChat, Google:

1. **Python变量命名规范详解**
   Platform: WeChat
   Source: 微信公众号
   Date: 2024-01-15
   Summary: 本文详细介绍了Python变量命名的最佳实践...
   Link: https://mp.weixin.qq.com/s?src=11&timestamp=...

2. **Python Async Programming Guide**
   Platform: Google
   Source: example.com
   Date: 2024-01-14
   Summary: A comprehensive guide to Python async programming...
   Link: https://example.com/python-async

...

Note: The returned URLs are real links (e.g., mp.weixin.qq.com for WeChat, zhihu.com for Zhihu), not redirect links from Sogou.

Testing the MCP Server

You can test the MCP server manually:

# Activate virtual environment
source .venv/bin/activate

# Run the server (it will read from stdin)
python mcp_server.py

Then send JSON-RPC messages to test:

{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {"protocolVersion": "2024-11-05"}}
{"jsonrpc": "2.0", "id": 2, "method": "tools/list"}
{"jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {"name": "start_vertical_search", "arguments": {"platform": "weixin", "query": "Python", "max_results": 3}}}
{"jsonrpc": "2.0", "id": 4, "method": "tools/call", "params": {"name": "get_search_status", "arguments": {"task_id": "<task_id_from_previous_response>"}}}

Quick Test Scripts

You can use the quick test scripts to test individual platforms:

# Activate virtual environment
source .venv/bin/activate

# Test Weixin search
python scripts/test_weixin.py "Python 异步编程" 10

# Test Zhihu search
python scripts/test_zhihu.py "机器学习" 15

# Test Google search (requires API credentials)
export APIKEY_GOOGLE_CUSTOM_SEARCH="your-api-key"
export APIKEY_GOOGLE_SEARCH_ID="your-search-engine-id"
python scripts/test_google.py "Python async programming" 10

These scripts provide colored output and show search progress in real-time.

Example Screenshots

The examples/ folder contains actual example screenshots of using the MCP server:

example_claude.jpg: Example of Claude Desktop using the MCP server for search
example_roocode.jpg: Example of RooCode using the MCP server for search

These screenshots demonstrate how to use the Vertical Search MCP server in different AI clients.

CLI (Command-Line Interface)

The project also provides a command-line interface for direct searching from the terminal.

Installation

After installing the project, the CLI is available as vertical-search:

# Install the project in development mode
pip install -e .

# Or use directly with Python
python -m cli.cli --help

Usage Examples

# Basic search
vertical-search "Python 异步编程"

# Search specific platform
vertical-search "Python" --platform weixin

# Search multiple platforms
vertical-search "Python" --platform weixin,google

# Search all platforms
vertical-search "Python" --platform all

# Without content (faster)
vertical-search "Python" --no-content

# Limit results
vertical-search "Python" --max-results 20

# List available platforms
vertical-search --list-platforms

# Verbose mode
vertical-search "Python" --verbose

CLI Features

Text output: Clean, colored text output suitable for terminal
Progress display: Simple, non-spammy progress updates
Cache indication: Shows [Cache hit] when results are from cache
Multi-platform support: Search across multiple platforms in one command
Error handling: Consistent error handling with MCP server

Output Format

The CLI outputs results in a readable text format with:

Colored titles, URLs, and metadata
Platform indication for multi-platform searches
Content status indicators (fetched, compressed, truncated)
Cache hit notifications

Direct Usage

import asyncio
from core.search_manager import UnifiedSearchManager
from platforms.weixin_searcher import WeixinSearcher
from core.browser_pool import BrowserPool

async def main():
    # Initialize browser pool and search manager
    pool = BrowserPool()
    manager = UnifiedSearchManager()
    
    # Register platform searcher
    # platform: Platform name (e.g., 'weixin', 'zhihu')
    # searcher: Platform-specific searcher instance
    manager.register_platform('weixin', WeixinSearcher(pool))
    
    try:
        # Basic search
        # platform: Platform name (must be registered)
        # query: Search query string
        # max_results: Maximum number of results to return (default: 10, max: 30)
        #   Note: If max_results > 10, pagination will be used automatically
        # use_cache: Whether to use cache (default: True)
        results = await manager.search(
            platform='weixin',
            query='Python',
            max_results=10,
            use_cache=True
        )
        
        # Process results
        for result in results:
            print(f"Title: {result['title']}")
            print(f"URL: {result['url']}")
            print(f"Source: {result['source']}")
            print(f"Date: {result['date']}")
            print(f"Snippet: {result['snippet']}")
            print("---")
        
        # Search for recent results
        recent_results = await manager.search(
            platform='weixin',
            query='Machine Learning',
            max_results=5,
            use_cache=True
        )
        
        # Search with pagination (max_results > 10)
        # Pagination is automatic when requesting more than 10 results
        many_results = await manager.search(
            platform='weixin',
            query='Python',
            max_results=20,  # Will automatically paginate to get 20 results
            use_cache=True
        )
        
        # Get registered platforms
        platforms = manager.get_registered_platforms()
        print(f"Available platforms: {platforms}")
        
    finally:
        # Always close the manager to clean up browser resources
        await manager.close()

# Run the async function
asyncio.run(main())

Example Output

# Example result structure
[
    {
        'title': 'Python变量命名规范详解',
        'url': 'https://mp.weixin.qq.com/s?src=11&timestamp=1767670509&ver=6463&signature=...',
        'source': '微信公众号',
        'date': '2024-01-15',  # May be empty if not available
        'snippet': '本文详细介绍了Python变量命名的最佳实践...'  # May be empty if not available
    },
    {
        'title': 'Python异步编程指南',
        'url': 'https://mp.weixin.qq.com/s?src=11&timestamp=1767670509&ver=6463&signature=...',
        'source': '微信公众号',
        'date': '2024-01-14',
        'snippet': '深入理解Python的asyncio模块...'
    },
    # ... more results
]

# Example for Zhihu platform
[
    {
        'title': 'Python 麦该愉么喜?',
        'url': 'https://www.zhihu.com/question/353341563',
        'source': '知乎',
        'date': '',  # May be empty if not available
        'snippet': ''  # May be empty if not available
    },
    # ... more results
]

Note: The returned URLs are real links directly to the target platforms:

WeChat articles: https://mp.weixin.qq.com/s?...
Zhihu questions/articles: https://www.zhihu.com/question/... or https://www.zhihu.com/...

The system automatically resolves redirect links from Sogou to get the final destination URLs.

Result Fields

Each result dictionary contains the following fields:

title (str): Article title
url (str): Article URL (real link to the target platform, not a redirect link)
- WeChat: https://mp.weixin.qq.com/s?...
- Zhihu: https://www.zhihu.com/question/... or https://www.zhihu.com/...
source (str): Source platform name (e.g., "微信公众号", "知乎")
date (str): Publication date (may be empty if not available)
snippet (str): Article snippet/description (may be empty if not available)
content (str, optional): Full article content (only when include_content=true)
- May be compressed if it exceeds token limits
- Status indicated by content_status field
content_status (str, optional): Content processing status
- "fetched": Content successfully fetched
- "compressed": Content was compressed using DeepSeek API
- "batch_compressed": Content was batch compressed with other articles
- "truncated": Content was truncated (compression failed or too large)
- "fetch_failed": Failed to fetch content

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

# Run specific test
pytest tests/unit/test_browser_pool.py -v

# Run Google Custom Search tests
pytest tests/unit/test_google_searcher.py -v
pytest tests/integration/test_google_search.py -v

# Run all Google tests
pytest tests/unit/test_google_searcher.py tests/integration/test_google_search.py -v

Note: Integration tests for Google Custom Search require API credentials:

APIKEY_GOOGLE_CUSTOM_SEARCH
APIKEY_GOOGLE_SEARCH_ID

If credentials are not set, integration tests will be automatically skipped.

Code Quality

# Type checking
mypy core/ platforms/ --strict

# Code formatting
black core/ platforms/ tests/

# Linting
flake8 core/ platforms/ --max-line-length=100

Project Structure

vertical-search-mcp/
├── mcp_server.py              # MCP protocol layer
├── core/                      # Core modules
│   ├── browser_pool.py        # Browser pool
│   ├── cache.py               # Cache layer
│   ├── base_searcher.py       # Base searcher class
│   ├── search_manager.py      # Unified manager
│   ├── url_resolver.py        # URL resolver (redirect to real links)
│   ├── content_fetcher.py     # Article content fetcher
│   ├── content_compressor.py  # Content compressor (DeepSeek API)
│   ├── content_processor.py   # Content processing coordinator
│   └── token_estimator.py    # Token estimation
├── platforms/                 # Platform adapters
│   ├── weixin_searcher.py     # WeChat searcher
│   ├── google_searcher.py     # Google Custom Search searcher
│   └── zhihu_searcher.py      # Zhihu searcher
├── config/                    # Configuration files
│   ├── platforms.yaml         # Platform configurations
│   ├── compression.yaml       # Content compression settings
│   └── anti_crawler.yaml      # Anti-crawler protection settings
├── examples/                   # Example screenshots
│   ├── example_claude.jpg     # Claude Desktop usage example
│   └── example_roocode.jpg    # RooCode usage example
├── tests/                     # Test files
│   ├── unit/                  # Unit tests
│   ├── integration/           # Integration tests
│   ├── benchmark/             # Performance tests
│   └── load/                  # Load tests
└── docs/                      # Documentation

Performance

Operation	Optimized	Improvement
First search	5.1s	-
Subsequent searches	0.9s	5.7x
Concurrent 3 searches	3.2s	4.8x
Cache hit	0.01s	∞

License

MIT License

Contributing

See CONTRIBUTING.md for details.

References

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured