Apeiron
Unlimited web access MCP server for AI agents, enabling search, fetch, and learning across multiple sources while bypassing anti-bot protections.
README
<p align="center"> <img src="images/header.jpg" width="100%" alt="Apeiron"> </p>
<h1 align="center">Apeiron</h1>
<p align="center"> <b>Local-first web search, fetch, and extraction tools for AI agents.</b> </p>
<p align="center"> <img src="https://img.shields.io/badge/license-MIT-000?style=flat-square" alt="license"> <img src="https://img.shields.io/badge/Python-3.10+-000?style=flat-square&logo=python" alt="python"> <img src="https://img.shields.io/badge/MCP-server-000?style=flat-square" alt="mcp"> </p>
Apeiron gives MCP-compatible agents and Python apps three practical tools:
apeiron_search: search web-oriented sources such as arXiv, Wikipedia, GitHub, and optional local SearXNG.apeiron_fetch: fetch a URL and return LLM-ready content plus tier/verdict diagnostics.apeiron_learn: remember the best working fetch strategy for a domain.
It is designed for Claude Code, OpenCode, Cursor, Cline, Windsurf, and local agent workflows where you want a free, inspectable web-access layer before reaching for paid scraping APIs.
Status
Works today:
- CLI, Python API, and MCP server surfaces.
- Fast HTTP fetch with
curl_cffiwhen the fetch extra is installed. - arXiv, Wikipedia, and GitHub search.
- Jina Reader fallback.
- Local response cache and per-domain strategy cache.
- Structured JSON output for CLI and MCP fetch/learn calls.
apeiron doctordiagnostics for optional dependencies and local services.
Experimental:
- Browser tiers: Patchright, CloakBrowser, Camoufox, FlareSolverr, browser-use.
- PDF/DOCX/PPTX/XLSX extraction through Markitdown.
- YouTube transcript extraction through yt-dlp subtitle metadata.
- Reddit search; it is not enabled by default because Reddit requires OAuth for reliable automated use.
- Git-based shared learning; local git commits are opt-in with
APEIRON_GIT_COMMIT=true.
Install
The PyPI name apeiron belongs to a different package. Until this project is published as apeiron-agent, use the GitHub install path.
Recommended
pipx install "git+https://github.com/insomnia-me/apeiron.git"
From source
git clone https://github.com/insomnia-me/apeiron.git
cd apeiron
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[fetch,mcp,documents,media]"
One-command local install
curl -fsSL https://raw.githubusercontent.com/insomnia-me/apeiron/main/install.sh | bash
Set APEIRON_INSTALL_PROFILE=all before running the script if you also want browser automation dependencies.
Quickstart
apeiron doctor
apeiron fetch "https://example.com" --json
apeiron search "python web scraping" --sources wikipedia github arxiv --json
apeiron learn "https://example.com" --json
MCP server
Example OpenCode config:
{
"mcp": {
"servers": {
"apeiron": {
"command": "python",
"args": ["-m", "apeiron.api.mcp_server"],
"cwd": "/path/to/apeiron"
}
}
}
}
MCP tools:
| Tool | What it returns |
|---|---|
apeiron_search("query") |
JSON array of search hits |
apeiron_fetch("url") |
JSON object with content, tier, verdict, content type, title, elapsed time, and error |
apeiron_learn("url") |
JSON object with learned tier/verdict diagnostics |
Python API
from apeiron import fetch_sync, search_sync
result = fetch_sync("https://example.com", cache_ttl=0)
print(result.verdict.value, result.tier.value)
print(result.content[:500])
hits = search_sync("agent web access", max_results=5)
for hit in hits:
print(hit.source.value, hit.title, hit.url)
Architecture
APEIRON
search
arXiv, Wikipedia, GitHub, optional SearXNG
fetch
fast HTTP -> browser tiers -> reader fallback
extract
Trafilatura, Readability, Markitdown
learn
strategies.json, challenge heuristics, opt-in git commits
api
CLI, Python API, MCP server
Optional infrastructure
SearXNG and FlareSolverr run through Docker Compose:
bash scripts/start-infra.sh
bash scripts/stop-infra.sh
Docker is optional. Apeiron can run CLI, Python API, MCP, fast fetch, and direct API search without local Docker services.
Safety boundary
Apeiron is for fetching public URLs and converting public content into agent-friendly text. It does not authorize credential bypass, private data access, or ignoring site policies. See SECURITY.md.
Roadmap
- Green deterministic CI on every pull request.
- More tests around fetch tier selection and extraction.
- Benchmark table with dated results and reproducible commands.
- Better browser-tier diagnostics.
- Explicit Reddit OAuth integration or removal from public source list.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.