Open Information MCP Toolkit
Self-hosted MCP toolkit providing web search, Wikipedia, and arXiv access without mandatory API keys, delivered as a Docker Compose bundle.
README
Open Information MCP Toolkit
Self-hosted, MIT-licensed MCP toolkit for agents that need access to open information without depending on mandatory third-party API keys. The stack is delivered as one Docker Compose bundle, but its capabilities are intentionally split into separate MCP endpoints:
- one general endpoint for broad internet access
- one specialized endpoint per focused knowledge domain
Today this repository ships:
- a general web endpoint implemented in this repo and backed by SearXNG plus Mozilla Readability
- a specialized encyclopedia endpoint backed by wikipedia-mcp
- a specialized academic paper endpoint backed by arxiv-mcp-server
Quick Start
Prerequisites: Docker Engine or Docker Desktop, plus Docker Compose.
Run the full toolkit locally with the default settings:
docker compose up --build
The Compose stack now uses the default project name open_information_mcp, so
generated Docker container, network, and volume names use that prefix by
default.
If you want to override ports or timeouts, create .env first using
.env.example, then start the stack.
Connect your MCP client to:
general-web-tools:http://localhost:8000/mcpwikipedia-research:http://localhost:8001/mcparxiv-research:http://localhost:8002/mcp
Quick verification:
- general health:
http://localhost:8000/healthz - arXiv health:
http://localhost:8002/healthz
Next steps:
- use the Researcher Agent Pattern for a copy/paste deep-research workflow that returns Markdown
- jump to Configuration for environment overrides
- jump to Contract Catalog for per-tool details
Concept
The toolkit is built around a simple separation of responsibilities:
generaltools expose reusable open-web primitives that almost any agent needsspecializedtools expose deeper workflows for one domain and live behind their own endpoint- the default stack relies on open-source components and works without mandatory external API keys
- the whole bundle is delivered as one Docker stack so clients can enable only the endpoints they need
This structure leaves room for future specialized endpoints such as library documentation research or YouTube transcription without changing the general endpoint contract.
Current Toolkit Layout
| Capability class | Default endpoint | Purpose | Current tools |
|---|---|---|---|
| General | http://localhost:8000/mcp |
Broad web search, page reading, and search capability discovery | web_search, read_url, search_config |
| Specialized | http://localhost:8001/mcp |
Encyclopedic and subject-area research through Wikipedia | search_wikipedia, get_article, get_summary, ... |
| Specialized | http://localhost:8002/mcp |
Scientific paper search, metadata lookup, download, semantic search, and reading through arXiv | search_papers, get_abstract, download_paper, semantic_search, reindex, list_papers, read_paper |
Services In The Compose Stack
valkeyfor SearXNG limiter statesearxngas the metasearch backend for the general endpointreadabilityas a local HTML-to-text extraction service for the general endpointmcp-web-searchas the in-repo FastMCP-based Streamable HTTP server that exposes the general toolswikipedia-mcpas a separate Streamable HTTP server for the specialized Wikipedia toolsetarxiv-mcpas a separate Streamable HTTP server for the specialized arXiv paper toolset, built from upstream source and published through a local FastMCP adapter
SearXNG and Readability stay on the internal Compose network by default. Only the MCP endpoints are exposed on the host.
Contract Catalog
The high-level concept lives in this README. Detailed contracts are grouped by
capability class under contracts/:
contracts/README.mdfor the contract catalog and extension rules- Researcher Agent Pattern for a concrete multi-endpoint deep-research workflow in Markdown
contracts/general/web_search.mdcontracts/general/read_url.mdcontracts/general/search_config.mdcontracts/specialized/wikipedia.mdcontracts/specialized/arxiv.md
The general endpoint contracts are owned by this repository. Specialized endpoint contracts may wrap upstream MCP packages; in that case this repository documents the integration surface and points to the upstream implementation ownership clearly.
Prerequisites
- Docker Engine or Docker Desktop
- Docker Compose
- Python 3.11+ if you want to run tests locally
Configuration
Environment variables are optional because Compose provides sensible defaults.
If you want overrides, define them in .env.
The Compose file sets the default project name to open_information_mcp.
Internal service hostnames still stay valkey, searxng, readability,
mcp-web-search, wikipedia-mcp, and arxiv-mcp.
General endpoint (mcp-web-search) variables:
MCP_PORTdefault8000SEARXNG_SECRETdefaultchange-meSEARXNG_REQUEST_TIMEOUT_SECONDSdefault15SEARXNG_CONNECT_TIMEOUT_SECONDSdefault5SEARXNG_REQUEST_RETRIESdefault1MAX_RESULTSdefault10MAX_QUERY_LENGTHdefault512MAX_PAGE_NUMBERdefault10DEFAULT_SAFE_SEARCHdefault1URL_READ_TIMEOUT_SECONDSdefault20URL_READ_MAX_BYTESdefault5242880URL_READ_MAX_CHARSdefault12000READABILITY_SERVICE_URLdefaulthttp://readability:3010READABILITY_FALLBACK_ON_FAILUREdefaulttrueVERIFY_SSLdefaulttrue; set tofalseonly to skip certificate validation for outbound HTTPS requests made by the runningmcp-web-searchserviceUSER_AGENTdefaultsearxng-mcp-search/0.1.0
Python image build variable:
PYTHON_INSTALL_VERIFY_SSLdefaulttrue; applies to themcp-web-search,wikipedia-mcp, andarxiv-mcpDocker builds. Set it tofalseonly on machines that cannot validate certificates whilepipdownloads dependencies. Thearxiv-mcpbuild uses the same toggle for the upstream HTTPSgit clone.
Specialized Wikipedia endpoint (wikipedia-mcp) variables:
WIKIPEDIA_MCP_PORTdefault8001WIKIPEDIA_LANGUAGEdefaultenWIKIPEDIA_ACCESS_TOKENoptional; reduces Wikipedia API rate limiting when set
To use a Wikipedia country/locale instead of a raw language code, override the
wikipedia-mcp service command in Compose (for example --country US) as
documented in wikipedia-mcp.
Specialized ArXiv endpoint (arxiv-mcp) variables:
ARXIV_MCP_PORTdefault8002ARXIV_UPSTREAM_REFdefaultmain; git ref from blazickjp/arxiv-mcp-server that is baked into the local image at build timeARXIV_STORAGE_PATHdefault/app/papers; passed to the upstream server as--storage-pathand backed by the named volumearxiv_papers
The upstream arxiv-mcp-server package currently runs over stdio. This
toolkit builds its own container from the upstream source repository and
publishes a Streamable HTTP endpoint at /mcp through the local adapter in
arxiv_mcp/app.py.
Run The Stack
docker compose up --build
Make sure Docker Desktop or the Docker Engine daemon is running before you start any Compose build command.
If you only want to validate image builds without starting the containers, run:
docker compose build
If a machine cannot validate certificates during Python dependency
installation, either set PYTHON_INSTALL_VERIFY_SSL=false in .env before
building or export it just for the build command:
$env:PYTHON_INSTALL_VERIFY_SSL = "false"
docker compose up --build
PYTHON_INSTALL_VERIFY_SSL=false docker compose up --build
To validate the insecure build path on Windows PowerShell without editing
.env, this one-liner works as well:
cmd /c "set PYTHON_INSTALL_VERIFY_SSL=false&& docker compose --progress=plain build"
VERIFY_SSL=false is separate: it only affects outbound HTTPS verification in
the running mcp-web-search container.
Generated Docker resources now use the Compose project name
open_information_mcp. Service names inside the stack are unchanged, so
commands such as docker compose restart arxiv-mcp still work as before.
Default host endpoints:
- General MCP endpoint:
http://localhost:8000/mcp - General health endpoint:
http://localhost:8000/healthz - Specialized Wikipedia MCP endpoint:
http://localhost:8001/mcp - Specialized ArXiv MCP endpoint:
http://localhost:8002/mcp - Specialized ArXiv health endpoint:
http://localhost:8002/healthz
Existing Installations: Volume Migration
If you already used this stack before the project rename, your old Docker
volumes may still use the previous default prefix, for example
web_search_mcp_searxng_cache and web_search_mcp_arxiv_papers. The current
Compose project creates open_information_mcp_searxng_cache and
open_information_mcp_arxiv_papers instead.
If you want to preserve old data, stop the stack and copy the volume contents once before your first long-running session under the new project name:
docker volume create open_information_mcp_searxng_cache
docker volume create open_information_mcp_arxiv_papers
docker run --rm -v web_search_mcp_searxng_cache:/from -v open_information_mcp_searxng_cache:/to alpine sh -c "cp -a /from/. /to/"
docker run --rm -v web_search_mcp_arxiv_papers:/from -v open_information_mcp_arxiv_papers:/to alpine sh -c "cp -a /from/. /to/"
If your old project prefix was not web_search_mcp, replace it in the source
volume names above. If you do not need the previous cache or downloaded papers,
you can skip this migration and let Docker create fresh volumes.
IDE / Cursor
Point each server at its Streamable HTTP URL:
{
"mcpServers": {
"general-web-tools": {
"url": "http://localhost:8000/mcp"
},
"wikipedia-research": {
"url": "http://localhost:8001/mcp"
},
"arxiv-research": {
"url": "http://localhost:8002/mcp"
}
}
}
Smoke Test
Check general endpoint readiness:
curl http://localhost:8000/healthz
Call all three MCP endpoints from Python with FastMCP's client:
import asyncio
from fastmcp import Client
async def main() -> None:
async with Client("http://127.0.0.1:8000/mcp") as general_client:
search_response = await general_client.call_tool(
"web_search",
{
"query": "JAX vmap tutorial",
"limit": 3,
"safe_search": 1,
"categories": ["general"],
},
)
print(search_response.data)
config_response = await general_client.call_tool("search_config", {})
print(config_response.data)
page_response = await general_client.call_tool(
"read_url",
{
"url": "https://jax.readthedocs.io/en/latest/quickstart.html",
"max_chars": 4000,
},
)
print(page_response.data)
async with Client("http://127.0.0.1:8001/mcp") as wikipedia_client:
wikipedia_response = await wikipedia_client.call_tool(
"search_wikipedia",
{
"query": "JAX",
"limit": 3,
},
)
print(wikipedia_response.data)
async with Client("http://127.0.0.1:8002/mcp") as arxiv_client:
arxiv_search_response = await arxiv_client.call_tool(
"search_papers",
{
"query": 'ti:"Attention Is All You Need"',
"categories": ["cs.CL", "cs.LG"],
"max_results": 1,
},
)
print(arxiv_search_response.data)
arxiv_abstract_response = await arxiv_client.call_tool(
"get_abstract",
{
"paper_id": "1706.03762",
},
)
print(arxiv_abstract_response.data)
arxiv_download_response = await arxiv_client.call_tool(
"download_paper",
{
"paper_id": "1706.03762",
},
)
print(arxiv_download_response.data)
arxiv_library_response = await arxiv_client.call_tool("list_papers", {})
print(arxiv_library_response.data)
arxiv_semantic_search_response = await arxiv_client.call_tool(
"semantic_search",
{
"query": "attention mechanisms for sequence modeling",
"max_results": 3,
},
)
print(arxiv_semantic_search_response.data)
arxiv_paper_response = await arxiv_client.call_tool(
"read_paper",
{
"paper_id": "1706.03762",
},
)
print(arxiv_paper_response.data)
asyncio.run(main())
To verify persistence after the first download:
- Run the Python smoke test above once.
- Restart only the specialized paper service with
docker compose restart arxiv-mcp. - Re-run
list_papersagainsthttp://127.0.0.1:8002/mcpand confirm that paper1706.03762is still present.
Local Tests
Install the package and test dependencies, then run pytest. On Windows, use
the project virtualenv interpreter:
.\.venv\Scripts\python.exe -m pip install -e ".[dev]"
.\.venv\Scripts\python.exe -m pytest
On macOS or Linux:
python -m pip install -e ".[dev]"
pytest
If the local machine cannot validate certificates during pip install, use
trusted hosts for the install step:
.\.venv\Scripts\python.exe -m pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -e ".[dev]"
.\.venv\Scripts\python.exe -m pytest
python -m pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -e ".[dev]"
pytest
Notes
- The general endpoint is implemented in
mcp_server/and currently published as thesearxng-mcp-searchPython package/script. - SearXNG JSON output is enabled in
searxng/core-config/settings.yml. - The limiter is enabled and backed by Valkey.
- SearXNG itself does not publish a host port in the default Compose setup.
arxiv-mcpwraps the upstreamarxiv-mcp-serverpackage with a local FastMCP adapter because the upstream server currently ships as a stdio MCP server.arxiv-mcpis built from the upstream blazickjp/arxiv-mcp-server source repository, usingARXIV_UPSTREAM_REFwith defaultmain.- The arXiv HTTP adapter currently exposes:
search_papers,get_abstract,download_paper,semantic_search,reindex,list_papers, andread_paper. semantic_searchworks over the local downloaded paper library and depends on the upstreamproextras being present in the built image.- The adapter exposes
http://localhost:8002/healthzfor container health checks. - Downloaded arXiv papers are stored in the named volume
arxiv_papers. - The current specialized catalog includes Wikipedia and arXiv, but
contracts/specialized/is meant to grow as new domain-specific endpoints are added.
License
This project is available under the MIT License.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.