servo-fetch

servo-fetch

Servo-powered MCP server for JS-aware web fetching, content extraction, crawling, and software-rendered screenshots — Chromium-free single binary.

Category
Visit Server

README

<div align="center"> <h1 align="center">servo-fetch</h1> <p align="center">A self-contained browser engine that fetches, renders, and extracts web content as Markdown, JSON, or screenshots — no Chromium, no API key, no setup.</p> <p> <a href="https://github.com/konippi/servo-fetch/actions"><img src="https://github.com/konippi/servo-fetch/workflows/CI/badge.svg" alt="CI"></a> <a href="https://crates.io/crates/servo-fetch"><img src="https://img.shields.io/crates/v/servo-fetch.svg" alt="crates.io"></a> <img src="https://img.shields.io/badge/Rust-1.86.0-blue?color=fc8d62&logo=rust" alt="MSRV"> <img src="https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg" alt="MIT OR Apache-2.0"> </p> <img src="assets/demo.gif" alt="servo-fetch demo" width="900"> </div>

servo-fetch embeds the Servo browser engine. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content — available as a CLI, a Rust library, and a Python SDK.

# CLI
servo-fetch "https://example.com"                          # clean Markdown
servo-fetch "https://example.com" --format png -o page.png # PNG screenshot
// Rust
let md = servo_fetch::markdown("https://example.com")?;
# Python
page = servo_fetch.fetch("https://example.com")
print(page.markdown)

Why servo-fetch

  • Zero dependencies — single binary, no Chromium, no API key
  • Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
  • Layout- and visibility-aware extraction — strips navbars, sidebars, footers by rendered position, plus cookie banners, modals, and CSS-hidden content (opacity:0, aria-hidden, sr-only)
  • Schema-driven JSON — declarative CSS-selector schema pulls structured data
  • Parallel batch fetch — multiple URLs fetched concurrently
  • Site crawling — BFS link traversal with robots.txt, same-site scope, and rate limiting
  • URL discovery — sitemap-based URL mapping without rendering (fast, lightweight)
  • Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
  • Accessibility tree — AccessKit integration with roles, names, and bounding boxes

Performance and quality

Apple M3 Pro, versus Playwright (the typical AI-agent stack):

Benchmark servo-fetch playwright:optimized
Time — static-small ~231 ms ~645 ms
Time — spa-heavy ~331 ms ~798 ms
Memory (peak RSS) 51–64 MB 300–328 MB

Extraction quality: mean word-F1 0.819 vs Readability's 0.728 across eight page-type fixtures, with without[] boilerplate removal at 95.0% vs 78.6%. Direct-binary engine peers (chrome-headless-shell, Lightpanda, curl) are opt-in.

Methodology, three-axis breakdown, per-fixture F1, and raw JSON: benchmarks/README.md + benchmarks/results/.

Install

Interface Install Docs
CLI curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh | sh CLI docs
Rust cargo add servo-fetch Library docs
Python pip install servo-fetch Python docs

<details> <summary><b>CLI install alternatives</b></summary>

cargo binstall servo-fetch-cli   # prebuilt binary
cargo install servo-fetch-cli    # build from source

Or download from GitHub Releases.

Linux — install runtime deps and use xvfb-run on headless servers:

sudo apt install -y libegl1 libfontconfig1 libfreetype6
xvfb-run --auto-servernum servo-fetch "https://example.com"

Windowscargo binstall does not copy sidecar files (cargo-binstall#353), so the installed servo-fetch.exe fails at startup with a missing libEGL.dll. Download the .zip from Releases instead — it bundles libEGL.dll and libGLESv2.dll.

macOS — no extra setup needed.

</details>

Quick Start

CLI

servo-fetch "https://example.com"                          # Markdown (default)
servo-fetch "https://example.com" --format json            # Structured JSON
servo-fetch "https://example.com" --format png -o page.png # PNG screenshot
servo-fetch "https://example.com" --js "document.title"    # Run JavaScript
servo-fetch "https://example.com" --schema schema.json     # Schema-driven JSON
servo-fetch URL1 URL2 URL3                                 # Parallel batch
servo-fetch "https://example.com" --output page.md         # Save to a single file
servo-fetch URL1 URL2 --output-dir ./out/                  # Save each URL to its own file
servo-fetch crawl "https://docs.example.com" --limit 20    # Crawl a site
servo-fetch crawl URL --output-dir ./pages/                # Save each crawled page to its own file
servo-fetch map "https://example.com"                      # Discover URLs via sitemap
servo-fetch mcp                                            # MCP server (stdio)
servo-fetch serve                                          # HTTP API server

Full CLI reference → servo-fetch-cli

Rust

cargo add servo-fetch
// URL → Markdown in one line
let md = servo_fetch::markdown("https://example.com")?;

// Fetch with options
use servo_fetch::{fetch, FetchOptions};
use std::time::Duration;

let page = fetch(FetchOptions::new("https://example.com").timeout(Duration::from_secs(60)))?;
println!("{}", page.html);
let md = page.markdown()?;

// Crawl a site
servo_fetch::crawl_each(
    servo_fetch::CrawlOptions::new("https://docs.example.com")
        .limit(100)
        .user_agent("MyBot/1.0"),
    |result| match &result.outcome {
        Ok(page) => println!("{}: {} chars", result.url, page.content.len()),
        Err(e) => eprintln!("{}: {e}", result.url),
    },
)?;

// Discover URLs via sitemap (no rendering)
let urls = servo_fetch::map(
    servo_fetch::MapOptions::new("https://example.com").limit(1000),
)?;
for u in &urls {
    println!("{}", u.url);
}

Full API reference → servo-fetch

Python

pip install servo-fetch
import servo_fetch

page = servo_fetch.fetch("https://example.com")
print(page.markdown)

# Schema extraction
from servo_fetch import Schema, Field
schema = Schema(
    base_selector=".product",
    fields=[
        Field(name="title", selector="h2", type="text"),
        Field(name="price", selector=".price", type="text"),
    ],
)
page = servo_fetch.fetch("https://shop.example.com", schema=schema)
print(page.extracted)

Full API reference → bindings/python

MCP Server

Built-in Model Context Protocol server with six tools: fetch, batch_fetch, crawl, map, screenshot, and execute_js.

{
  "mcpServers": {
    "servo-fetch": {
      "command": "servo-fetch",
      "args": ["mcp"]
    }
  }
}

Streamable HTTP: servo-fetch mcp --port 8080

Full MCP tool reference → servo-fetch-cli README

HTTP API

REST endpoints for containerized deployments and HTTP clients:

servo-fetch serve                            # 127.0.0.1:3000
servo-fetch serve --host 0.0.0.0 --port 80   # expose to network

curl -X POST http://127.0.0.1:3000/v1/fetch \
  -H 'content-type: application/json' \
  -d '{"url":"https://example.com"}'

Endpoints: GET /health, GET /version, POST /v1/fetch, POST /v1/batch_fetch, POST /v1/screenshot, POST /v1/execute_js, POST /v1/crawl, POST /v1/map.

Full HTTP API reference → servo-fetch-cli README

Docker

Multi-arch image on GitHub Container Registry (linux/amd64, linux/arm64):

docker run --rm -p 3000:3000 ghcr.io/konippi/servo-fetch:latest
curl -X POST http://127.0.0.1:3000/v1/fetch \
  -H 'content-type: application/json' \
  -d '{"url":"https://example.com"}'

Runs as non-root (UID 1001). Images are signed with cosign (keyless) and published with SLSA provenance and SBOM attestations.

Agent Skills

servo-fetch ships with an Agent Skills package for AI coding agents:

npx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetch

Security

servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.

Limitations

  • Sites behind login walls or CAPTCHAs are not supported.

Contributing

See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.

License

MIT OR Apache-2.0

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured