serpent

serpent

A metasearch backend MCP server that aggregates results from multiple search engines and knowledge sources into structured JSON for AI agents. It provides unified search capabilities across web, academic, developer, and knowledge providers through MCP tools.

Category
Visit Server

README

serpent

An open-source metasearch backend built for MCP / AI agent workflows.

It aggregates results from multiple search engines, returns a unified schema, and exposes both a standard HTTP API and an MCP server that LLM agents can call directly.


Why this exists

Most search aggregators are designed for human-readable output: HTML pages, result cards, pagination UIs. When an LLM agent needs to search the web, it needs something different: structured JSON, stable field names, concurrent multi-source results, and predictable error handling.

serpent is designed for that use case. It is not a SearXNG clone.

Positioning

  • Agent-friendly metasearch backend
  • MCP-first search gateway for LLM workflows
  • Structured search API designed for AI pipelines

Supported providers

Google

Google is not scraped directly. The reason is practical: Google's anti-bot measures make self-hosted scraping fragile. Maintaining a reliable scraper against Google's continuously evolving detection means constant breakage and high maintenance overhead. For production use cases, third-party providers are more reliable and cost-effective.

Currently supported Google providers:

Provider Env var Notes
serpbase.dev SERPBASE_API_KEY Pay-per-use; generally cheaper for low volume
serper.dev SERPER_API_KEY 2,500 free queries, then pay-per-use

Both are low-cost options. For casual or low-volume use, serpbase.dev tends to be cheaper per query. Either works; configure whichever you prefer, or both for fallback.

Web search

Provider name Method Auth
DuckDuckGo duckduckgo HTML scraping (lite endpoint) No
Bing bing HTML scraping No
Yahoo yahoo HTML scraping No
Brave brave Official Search API Optional (free tier: 2000/month)
Ecosia ecosia HTML scraping No
Mojeek mojeek HTML scraping No
Startpage startpage HTML scraping (best-effort) No
Qwant qwant Internal JSON API (best-effort) No
Yandex yandex HTML scraping (best-effort) No
Baidu baidu HTML scraping (best-effort) No

Providers marked best-effort use undocumented endpoints or scraping targets with strong anti-bot measures. They may stop working without warning.

Knowledge / reference

Provider name Method Auth
Wikipedia wikipedia MediaWiki Action API No
Wikidata wikidata Wikidata API (entity search) No
Internet Archive internet_archive Advanced Search API No

Developer

Provider name Method Auth
GitHub github GitHub REST API No (token raises rate limit)
Stack Overflow stackoverflow Stack Exchange API No (key raises limit)
Hacker News hackernews Algolia HN API No
Reddit reddit Public JSON API No
npm npm npm registry API No
PyPI pypi HTML scraping No
crates.io crates crates.io REST API No

Academic

Provider name Method Auth
arXiv arxiv Atom API No
PubMed pubmed NCBI E-utilities No (key raises rate limit)
Semantic Scholar semanticscholar Graph API No (key raises rate limit)
CrossRef crossref REST API (145M+ DOIs) No

Installation

# Clone the repository
git clone https://github.com/your-org/serpent
cd serpent

# Install with pip (editable)
pip install -e ".[dev]"

# Or with uv
uv pip install -e ".[dev]"

Configuration

Copy .env.example to .env and fill in your keys:

cp .env.example .env
# Required for Google search (at least one)
SERPBASE_API_KEY=your_key_here
SERPER_API_KEY=your_key_here

# Optional — omit to use unauthenticated/public access
BRAVE_API_KEY=            # free tier: 2000 req/month
GITHUB_TOKEN=             # raises rate limit from 60 to 5000 req/hour
STACKEXCHANGE_API_KEY=    # raises limit from 300 to 10,000 req/day
NCBI_API_KEY=             # PubMed; raises from 3 to 10 req/sec
SEMANTIC_SCHOLAR_API_KEY= # raises from 1 to 10 req/sec

# Server
HOST=0.0.0.0
PORT=8000

# Restrict which providers are active (comma-separated, empty = all available)
ENABLED_PROVIDERS=
ALLOW_UNSTABLE_PROVIDERS=false

# Timeouts in seconds
DEFAULT_TIMEOUT=10
AGGREGATOR_TIMEOUT=15
MAX_RESULTS_PER_PROVIDER=10

Running

HTTP API server

python -m serpent.main
# or
serpent

Server starts at http://localhost:8000. Interactive docs at /docs.

MCP server

python -m serpent.mcp_server
# or
serpent-mcp

The MCP server communicates over stdio. Use it with any MCP-compatible client (Claude Desktop, cline, continue.dev, etc.).

Docker

Build the image:

docker build -t serpent .

Run the HTTP API:

docker run --rm -p 8000:8000 --env-file .env serpent

Or with Docker Compose:

docker compose up --build

The container starts the HTTP API on http://localhost:8000.


HTTP API

POST /search

Aggregate search across all enabled providers.

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "rust async runtime"}'

With explicit providers and params:

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "rust async runtime",
    "providers": ["duckduckgo", "wikipedia"],
    "params": {"num_results": 5, "language": "en"}
  }'

Response:

{
  "engine": "serpent",
  "query": "rust async runtime",
  "results": [
    {
      "title": "Tokio - An asynchronous Rust runtime",
      "url": "https://tokio.rs",
      "snippet": "Tokio is an event-driven, non-blocking I/O platform...",
      "source": "tokio.rs",
      "rank": 1,
      "provider": "duckduckgo",
      "published_date": null,
      "extra": {}
    }
  ],
  "related_searches": ["tokio vs async-std", "rust futures"],
  "suggestions": [],
  "answer_box": null,
  "timing_ms": 843.2,
  "providers": [
    {"name": "duckduckgo", "success": true, "result_count": 10, "latency_ms": 840.1, "error": null},
    {"name": "wikipedia", "success": true, "result_count": 3, "latency_ms": 320.5, "error": null}
  ],
  "errors": []
}

POST /search/google

curl -X POST http://localhost:8000/search/google \
  -H "Content-Type: application/json" \
  -d '{"query": "site:github.com rust tokio"}'

GET /health

curl http://localhost:8000/health
# {"status": "ok"}

GET /providers

curl http://localhost:8000/providers
{
  "available": [
    {"name": "google_serpbase", "tags": ["google", "web"]},
    {"name": "duckduckgo", "tags": ["web", "privacy"]},
    {"name": "wikipedia", "tags": ["web", "academic", "knowledge"]},
    {"name": "github", "tags": ["code", "web"]},
    {"name": "arxiv", "tags": ["academic", "web"]}
  ],
  "count": 5
}

MCP usage

Configure your MCP client to run serpent-mcp (or python -m serpent.mcp_server).

Example Claude Desktop config (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "serpent": {
      "command": "serpent-mcp",
      "env": {
        "SERPBASE_API_KEY": "your_key",
        "SERPER_API_KEY": "your_key"
      }
    }
  }
}

Available MCP tools

search_web

General web search across all enabled providers.

{
  "query": "fastapi vs flask performance 2024",
  "num_results": 10
}

search_google

Google search via a configured third-party provider.

{
  "query": "site:docs.python.org asyncio",
  "provider": "google_serpbase"
}

search_academic

Search arXiv and Wikipedia.

{
  "query": "transformer architecture attention mechanism",
  "num_results": 8
}

search_github

Search GitHub repositories.

{
  "query": "python mcp server implementation",
  "num_results": 5
}

compare_engines

Run the same query across multiple providers and return results grouped by engine.

{
  "query": "vector database comparison",
  "providers": ["duckduckgo", "brave"],
  "num_results": 5
}

Result schema reference

Every result object has these fields:

Field Type Description
title string Result title
url string Result URL
snippet string Text excerpt / description
source string Domain name
rank int 1-based position in final merged list
provider string Provider that returned this result
published_date string | null ISO date (YYYY-MM-DD), if available
extra object Provider-specific data (e.g. GitHub stars, arXiv authors)

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with auto-reload
uvicorn serpent.main:app --reload

Roadmap

  • [ ] Caching layer (in-memory / Redis) for repeated queries
  • [ ] Relevance re-ranking across providers
  • [ ] More providers: Bing (official API), Kagi, Tavily
  • [ ] Rate limiting per provider with backoff
  • [ ] Streaming responses (SSE) for long aggregations
  • [ ] Docker image and Compose setup
  • [ ] Provider health monitoring endpoint
  • [ ] Result scoring and confidence signals

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured