MCPSearch
AI-powered multi-source research and crawling platform with MCP integration, enabling web search, crawling, social source aggregation, and structured research workflows.
README
MCPSearch
AI-powered multi-source research and crawling platform with MCP integration
Overview
MCPSearch is a self-hosted research stack for agents and developers. It combines:
- parallel web search across multiple engines
- HTTP + browser + stealth crawling
- social and developer-source collection
- structured content extraction
- MCP-native tool exposure
- higher-level research workflows via
investigate,compare, andtrending
The project has grown beyond a simple crawler. The current shape is:
- 29 MCP tools in
mcp_server/server.py - a unified
mcpsearch/mcpsearch_multiinterface - shared action routing in
mcp_server/handlers.py - a flagship orchestration layer in
agents/research_agent.py
Current Capabilities
- Web search: DuckDuckGo, Google, and Bing aggregation
- Crawling modes:
fastvia HTTP only,hybridvia HTTP + Playwright,stealthvia anti-bot fallback - Extraction:
markdown/text extraction, tables, code blocks, images, metadata, JSON-LD/OpenGraph/Microdata via
extruct - Fast parsing:
selectolaxon hot search parsing paths with BeautifulSoup fallback - Social sources: Reddit, Twitter/X, YouTube, GitHub
- HTTP caching: shared async client factory with optional Hishel-backed caching on request-heavy paths
- Research workflows:
research_agent,investigate,compare,trending - Tool discovery:
list_tools,describe_tools,get_crawl_stats
Install
Basic install
git clone https://github.com/JonusNattapong/MCPSearch.git
cd MCPSearch
pip install -e .
playwright install chromium
Development install
make dev
or:
pip install -e ".[dev]"
playwright install chromium
Optional stealth dependency
crawler/stealth.py can use Camoufox when it is installed. If Camoufox is not available, MCPSearch falls back to Playwright-based stealth behavior.
Environment variables
OPENAI_API_KEYOptional. Used by summarization flows when AI summaries are enabled.
Quick Start
CLI
# Search
mcpsearch search -q "AI agents"
# Crawl a page
mcpsearch crawl -u "https://example.com"
# Read a page in terminal-friendly format
mcpsearch read -u "https://example.com"
# Research workflow
mcpsearch research --query "browser fingerprinting" --depth deep --summarize
# Compare topics
mcpsearch compare --compare "React" "Vue" "Svelte" --depth medium
# Trending view
mcpsearch trending --max-results 10
# Run MCP server
mcpsearch server
Python / MCP-facing examples
# Unified tool
mcpsearch(action="search", query="LLM agents", limit=5)
mcpsearch(action="crawl", url="https://example.com", mode="hybrid")
mcpsearch(action="reddit", query="python", subreddit="learnpython")
mcpsearch(action="github", query="browser automation", sort="stars")
# Multi-action orchestration
mcpsearch_multi(actions='[
{"action":"search","query":"agent memory patterns"},
{"action":"reddit","query":"LocalLLaMA"},
{"action":"github","query":"llm agents","sort":"stars"}
]')
# Flagship research tools
investigate(topic="Python async scraping", depth="deep", include_social=True)
compare(topics="React,Vue,Svelte", depth="medium", max_sources=3)
trending(platforms="reddit,github", limit=10)
MCP Integration
Claude Desktop
{
"mcpServers": {
"mcpsearch": {
"command": "python",
"args": ["-m", "mcp_server"],
"cwd": "/path/to/MCPSearch",
"env": {
"OPENAI_API_KEY": ""
}
}
}
}
Cursor
{
"mcpServers": {
"mcpsearch": {
"command": "python",
"args": ["-m", "mcp_server"],
"cwd": "/path/to/MCPSearch"
}
}
}
Custom MCP client
{
"command": "python",
"args": ["-m", "mcp_server"],
"transport": "stdio"
}
Tool Map
Unified tools
mcpsearchmcpsearch_multi
Search and crawl tools
web_searchsearch_and_summarizesmart_searchdeep_searchcrawl_urlhybrid_crawlcrawl_recursiveextract_contentget_crawl_stats
Social tools
search_redditget_subredditget_reddit_postsearch_twitterget_user_tweetssearch_youtubeget_youtube_channelget_youtube_contentsearch_githubget_github_userget_github_repoget_github_readme
Research tools
research_agentinvestigatecomparetrending
Discovery tools
list_toolsdescribe_tools
Recommended Entry Points
If you are integrating MCPSearch into an agent:
- start with
list_toolsanddescribe_tools - prefer
mcpsearchfor simple routing - use
mcpsearch_multiwhen you want parallel source gathering - use
investigatefor richer topic-oriented research - use
comparewhen the output should be side-by-side - use
trendingfor source discovery and early signal collection
Research Workflows
investigate
Best when you want one topic explored across search, crawl, and social sources.
investigate(
topic="anti-bot browser strategies",
depth="deep",
include_social=True,
include_summary=True,
max_sources=5,
)
compare
Best when you want repeated shallow or medium investigations and a compact comparison result.
compare(
topics="Playwright,Selenium,Camoufox",
depth="medium",
max_sources=3,
)
trending
Best when you want new leads before deeper crawling.
trending(
platforms="reddit,github",
limit=10,
)
Architecture
Request flow
Query / URL / Topic
|
v
mcpsearch / direct tool
|
v
mcp_server/handlers.py
|
+--> search/aggregator.py
+--> crawler/engine.py
+--> crawler/hybrid.py
+--> crawler/stealth.py
+--> social/*.py
+--> agents/research_agent.py
Crawl strategy
fast -> HTTP only
hybrid -> HTTP first, then browser rendering when needed
stealth -> multi-browser / anti-bot fallback path
Current project structure
MCPSearch/
├── agents/ # Higher-level research orchestration
├── crawler/ # HTTP, hybrid, stealth, extraction logic
├── mcp_server/ # MCP server, unified tools, shared handlers
├── search/ # Search aggregation
├── social/ # Reddit, Twitter/X, YouTube, GitHub scrapers
├── summarizer/ # AI summarization helpers
├── tests/ # Workflow and unit tests
├── utils/ # Cache, dedup, rate limiting
├── cli.py # CLI entry point
├── Makefile # Dev/test/release commands
└── pyproject.toml # Package metadata and dependencies
Development
Useful commands
make install
make dev
make test
make test-cov
make lint
make lint-fix
make format
make server
python3 scripts/benchmark_search_and_crawl.py
Focused test commands
make test-hybrid
make test-rate-limiter
pytest tests/test_extractor.py -v
pytest tests/test_search_parsers.py -v
pytest tests/test_mcp_integration.py -v
pytest tests/test_mcp_tools.py -v
Release
make patch
make minor
make major
Version is sourced from mcpspider/version.py.
Project Status Notes
- The README now reflects
mcpsearch/mcpsearch_multi, not the olderscoutnaming. - Playwright is part of declared dependencies.
- Camoufox support exists in code, but is optional at install time.
- The main research direction is now orchestration, attribution, and multi-source analysis, not just single-page crawling.
Practical Next Improvements
See docs/USEFUL_LIBS.md for a curated list of libraries and implementation tricks that fit the current architecture.
Legal and Ethical Usage
Use MCPSearch responsibly.
- Respect target site policies and applicable law.
- Use rate limiting and caching to reduce load.
- Review platform terms before large-scale scraping.
- Avoid collecting or redistributing restricted personal data.
Contributing
Contribution guidance lives in CONTRIBUTING.md.
License
MIT. See LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.