servo-fetch
Servo-powered MCP server for JS-aware web fetching, content extraction, crawling, and software-rendered screenshots — Chromium-free single binary.
README
<div align="center"> <h1 align="center">servo-fetch</h1> <p align="center">A self-contained browser engine that fetches, renders, and extracts web content as Markdown, JSON, or screenshots — no Chromium, no API key, no setup.</p> <p> <a href="https://github.com/konippi/servo-fetch/actions"><img src="https://github.com/konippi/servo-fetch/workflows/CI/badge.svg" alt="CI"></a> <a href="https://crates.io/crates/servo-fetch"><img src="https://img.shields.io/crates/v/servo-fetch.svg" alt="crates.io"></a> <img src="https://img.shields.io/badge/Rust-1.86.0-blue?color=fc8d62&logo=rust" alt="MSRV"> <img src="https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg" alt="MIT OR Apache-2.0"> </p> <img src="assets/demo.gif" alt="servo-fetch demo" width="900"> </div>
servo-fetch embeds the Servo browser engine. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content — available as a CLI, a Rust library, and a Python SDK.
# CLI
servo-fetch "https://example.com" # clean Markdown
servo-fetch "https://example.com" --format png -o page.png # PNG screenshot
// Rust
let md = servo_fetch::markdown("https://example.com")?;
# Python
page = servo_fetch.fetch("https://example.com")
print(page.markdown)
Why servo-fetch
- Zero dependencies — single binary, no Chromium, no API key
- Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
- Layout- and visibility-aware extraction — strips navbars, sidebars, footers by rendered position, plus cookie banners, modals, and CSS-hidden content (
opacity:0,aria-hidden, sr-only) - Schema-driven JSON — declarative CSS-selector schema pulls structured data
- Parallel batch fetch — multiple URLs fetched concurrently
- Site crawling — BFS link traversal with robots.txt, same-site scope, and rate limiting
- URL discovery — sitemap-based URL mapping without rendering (fast, lightweight)
- Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
- Accessibility tree — AccessKit integration with roles, names, and bounding boxes
Performance and quality
Apple M3 Pro, versus Playwright (the typical AI-agent stack):
| Benchmark | servo-fetch | playwright:optimized |
|---|---|---|
| Time — static-small | ~231 ms | ~645 ms |
| Time — spa-heavy | ~331 ms | ~798 ms |
| Memory (peak RSS) | 51–64 MB | 300–328 MB |
Extraction quality: mean word-F1 0.819 vs Readability's 0.728 across
eight page-type fixtures, with without[] boilerplate removal at 95.0%
vs 78.6%. Direct-binary engine peers (chrome-headless-shell, Lightpanda,
curl) are opt-in.
Methodology, three-axis breakdown, per-fixture F1, and raw JSON:
benchmarks/README.md +
benchmarks/results/.
Install
| Interface | Install | Docs |
|---|---|---|
| CLI | curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh | sh |
CLI docs |
| Rust | cargo add servo-fetch |
Library docs |
| Python | pip install servo-fetch |
Python docs |
<details> <summary><b>CLI install alternatives</b></summary>
cargo binstall servo-fetch-cli # prebuilt binary
cargo install servo-fetch-cli # build from source
Or download from GitHub Releases.
Linux — install runtime deps and use xvfb-run on headless servers:
sudo apt install -y libegl1 libfontconfig1 libfreetype6
xvfb-run --auto-servernum servo-fetch "https://example.com"
Windows — cargo binstall does not copy sidecar files (cargo-binstall#353), so the installed servo-fetch.exe fails at startup with a missing libEGL.dll. Download the .zip from Releases instead — it bundles libEGL.dll and libGLESv2.dll.
macOS — no extra setup needed.
</details>
Quick Start
CLI
servo-fetch "https://example.com" # Markdown (default)
servo-fetch "https://example.com" --format json # Structured JSON
servo-fetch "https://example.com" --format png -o page.png # PNG screenshot
servo-fetch "https://example.com" --js "document.title" # Run JavaScript
servo-fetch "https://example.com" --schema schema.json # Schema-driven JSON
servo-fetch URL1 URL2 URL3 # Parallel batch
servo-fetch "https://example.com" --output page.md # Save to a single file
servo-fetch URL1 URL2 --output-dir ./out/ # Save each URL to its own file
servo-fetch crawl "https://docs.example.com" --limit 20 # Crawl a site
servo-fetch crawl URL --output-dir ./pages/ # Save each crawled page to its own file
servo-fetch map "https://example.com" # Discover URLs via sitemap
servo-fetch mcp # MCP server (stdio)
servo-fetch serve # HTTP API server
Full CLI reference → servo-fetch-cli
Rust
cargo add servo-fetch
// URL → Markdown in one line
let md = servo_fetch::markdown("https://example.com")?;
// Fetch with options
use servo_fetch::{fetch, FetchOptions};
use std::time::Duration;
let page = fetch(FetchOptions::new("https://example.com").timeout(Duration::from_secs(60)))?;
println!("{}", page.html);
let md = page.markdown()?;
// Crawl a site
servo_fetch::crawl_each(
servo_fetch::CrawlOptions::new("https://docs.example.com")
.limit(100)
.user_agent("MyBot/1.0"),
|result| match &result.outcome {
Ok(page) => println!("{}: {} chars", result.url, page.content.len()),
Err(e) => eprintln!("{}: {e}", result.url),
},
)?;
// Discover URLs via sitemap (no rendering)
let urls = servo_fetch::map(
servo_fetch::MapOptions::new("https://example.com").limit(1000),
)?;
for u in &urls {
println!("{}", u.url);
}
Full API reference → servo-fetch
Python
pip install servo-fetch
import servo_fetch
page = servo_fetch.fetch("https://example.com")
print(page.markdown)
# Schema extraction
from servo_fetch import Schema, Field
schema = Schema(
base_selector=".product",
fields=[
Field(name="title", selector="h2", type="text"),
Field(name="price", selector=".price", type="text"),
],
)
page = servo_fetch.fetch("https://shop.example.com", schema=schema)
print(page.extracted)
Full API reference → bindings/python
MCP Server
Built-in Model Context Protocol server with six tools: fetch,
batch_fetch, crawl, map, screenshot, and execute_js.
{
"mcpServers": {
"servo-fetch": {
"command": "servo-fetch",
"args": ["mcp"]
}
}
}
Streamable HTTP: servo-fetch mcp --port 8080
Full MCP tool reference → servo-fetch-cli README
HTTP API
REST endpoints for containerized deployments and HTTP clients:
servo-fetch serve # 127.0.0.1:3000
servo-fetch serve --host 0.0.0.0 --port 80 # expose to network
curl -X POST http://127.0.0.1:3000/v1/fetch \
-H 'content-type: application/json' \
-d '{"url":"https://example.com"}'
Endpoints: GET /health, GET /version, POST /v1/fetch, POST /v1/batch_fetch, POST /v1/screenshot, POST /v1/execute_js, POST /v1/crawl, POST /v1/map.
Full HTTP API reference → servo-fetch-cli README
Docker
Multi-arch image on GitHub Container Registry (linux/amd64, linux/arm64):
docker run --rm -p 3000:3000 ghcr.io/konippi/servo-fetch:latest
curl -X POST http://127.0.0.1:3000/v1/fetch \
-H 'content-type: application/json' \
-d '{"url":"https://example.com"}'
Runs as non-root (UID 1001). Images are signed with cosign (keyless) and published with SLSA provenance and SBOM attestations.
Agent Skills
servo-fetch ships with an Agent Skills package for AI coding agents:
npx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetch
Security
servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.
Limitations
- Sites behind login walls or CAPTCHAs are not supported.
Contributing
See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.
License
MIT OR Apache-2.0
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.