Research Powerpack MCP
Enables AI assistants to perform comprehensive research by searching Google, mining Reddit discussions, scraping web content with JS rendering, and synthesizing findings with citations into structured context.
README
MCP server that gives your AI assistant research tools. Google search, Reddit deep-dives, web scraping with LLM extraction, and multi-model deep research — all as MCP tools that chain into each other.
npx mcp-research-powerpack
five tools, zero config to start. each API key you add unlocks more capabilities.
tools
| tool | what it does | requires |
|---|---|---|
web_search |
parallel Google search across 3-100 keywords, CTR-weighted ranking, consensus detection | SERPER_API_KEY |
search_reddit |
same engine but filtered to reddit.com, 10-50 queries in parallel | SERPER_API_KEY |
get_reddit_post |
fetches 2-50 Reddit posts with full comment trees, optional LLM extraction | REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET |
scrape_links |
scrapes 1-50 URLs with JS rendering fallback, HTML-to-markdown, optional LLM extraction | SCRAPEDO_API_KEY |
deep_research |
sends questions to research-capable models (Grok, Gemini) with web search enabled, supports local file attachments | OPENROUTER_API_KEY |
tools are designed to chain: web_search suggests calling scrape_links, which suggests search_reddit, which suggests get_reddit_post, which suggests deep_research for synthesis.
install
Claude Desktop / Claude Code
add to your MCP config:
{
"mcpServers": {
"research-powerpack": {
"command": "npx",
"args": ["mcp-research-powerpack"],
"env": {
"SERPER_API_KEY": "...",
"OPENROUTER_API_KEY": "..."
}
}
}
}
from source
git clone https://github.com/yigitkonur/mcp-research-powerpack.git
cd mcp-research-powerpack
pnpm install && pnpm build
pnpm start
HTTP mode
MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-research-powerpack
exposes /mcp (POST/GET/DELETE with session headers) and /health.
API keys
each key unlocks a capability. missing keys silently disable their tools — the server never crashes.
| variable | enables | free tier |
|---|---|---|
SERPER_API_KEY |
web_search, search_reddit |
2,500 searches/mo at serper.dev |
REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET |
get_reddit_post |
unlimited (reddit.com/prefs/apps, "script" type) |
SCRAPEDO_API_KEY |
scrape_links |
1,000 credits/mo at scrape.do |
OPENROUTER_API_KEY |
deep_research, LLM extraction in scrape/reddit |
pay-per-token at openrouter.ai |
configuration
optional tuning via environment variables:
| variable | default | description |
|---|---|---|
RESEARCH_MODEL |
x-ai/grok-4-fast |
primary deep research model |
RESEARCH_FALLBACK_MODEL |
google/gemini-2.5-flash |
fallback if primary fails |
LLM_EXTRACTION_MODEL |
openai/gpt-oss-120b:nitro |
model for scrape/reddit LLM extraction |
DEFAULT_REASONING_EFFORT |
high |
research depth (low, medium, high) |
DEFAULT_MAX_URLS |
100 |
max search results per research question (10-200) |
API_TIMEOUT_MS |
1800000 |
request timeout in ms (default 30 min) |
MCP_TRANSPORT |
stdio |
stdio or http |
MCP_PORT |
3000 |
port for HTTP mode |
how it works
search ranking
results from multiple queries are deduplicated by normalized URL and scored using CTR-weighted position values (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. threshold tries >= 3, falls back to >= 2, then >= 1.
Reddit comment budget
global budget of 1,000 comments, max 200 per post. after the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.
scraping pipeline
three-mode fallback per URL: basic → JS rendering → JS + US geo-targeting. results go through HTML-to-markdown conversion (turndown), then optional LLM extraction with a 100k char input cap and 8,000 token output per URL.
deep research
32,000 token budget divided across questions (1 question = 32k, 10 questions = 3.2k each). Gemini models get google_search tool access. Grok/Perplexity get search_parameters with citations. primary model fails → automatic fallback.
file attachments
deep_research can read local files and include them as context. files over 600 lines are smart-truncated (first 500 + last 100 lines). line numbers preserved.
concurrency
| operation | parallel limit |
|---|---|
| web search keywords | 8 |
| Reddit search queries | 8 |
| Reddit post fetches per batch | 5 (batches of 10) |
| URL scraping per batch | 10 (batches of 30) |
| LLM extraction | 3 |
| deep research questions | 3 |
all clients use manual retry with exponential backoff and jitter. the OpenAI SDK's built-in retry is disabled (maxRetries: 0).
project structure
src/
index.ts — entry point, STDIO + HTTP transport, signal handling
worker.ts — Cloudflare Workers entry (Durable Objects)
config/
index.ts — env parsing (lazy Proxy objects), capability detection
loader.ts — YAML → Zod → JSON Schema pipeline, cached
yaml/tools.yaml — single source of truth for all tool definitions
schemas/
deep-research.ts — Zod validation for research questions + file attachments
scrape-links.ts — Zod validation for URLs, timeout, LLM options
web-search.ts — Zod validation for keyword arrays
tools/
registry.ts — tool lookup → capability check → validate → execute
search.ts — web_search handler
reddit.ts — search_reddit + get_reddit_post handlers
scrape.ts — scrape_links handler
research.ts — deep_research handler
clients/
search.ts — Serper API client
reddit.ts — Reddit OAuth + comment fetching
scraper.ts — scrape.do client with fallback modes
research.ts — OpenRouter client with model-specific handling
services/
llm-processor.ts — shared LLM extraction (singleton OpenAI client)
markdown-cleaner.ts — HTML → markdown via turndown
file-attachment.ts — local file reading with line ranges
utils/
concurrency.ts — bounded parallel execution (pMap, pMapSettled)
url-aggregator.ts — CTR-weighted scoring and consensus detection
errors.ts — error classification, fetchWithTimeout
logger.ts — MCP logging protocol
response.ts — standardized output formatting
deploy
Cloudflare Workers
npx wrangler deploy
uses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions in the worker entry since there's no filesystem.
license
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.