Robot Resources Scraper
Web scraper and token compressor that converts HTML to clean markdown with 70-80% fewer tokens. Single-page compression and multi-page BFS crawling with auto-fallback fetch modes.
README
@robot-resources/scraper-mcp
MCP server for Scraper — context compression for AI agents.
What is Robot Resources?
Human Resources, but for your AI agents.
Robot Resources gives AI agents two superpowers:
- Router — Routes each LLM call to the cheapest capable model. 60-90% cost savings across OpenAI, Anthropic, and Google.
- Scraper — Compresses web pages to clean markdown. 70-80% fewer tokens per page.
Both run locally. Your API keys never leave your machine. Free, unlimited, no tiers.
Install the full suite
npx robot-resources
One command sets up everything. Learn more at robotresources.ai
About this MCP server
This package gives AI agents two tools to compress web content into token-efficient markdown via the Model Context Protocol: single-page compression and multi-page BFS crawling.
Installation
npx @robot-resources/scraper-mcp
Or install globally:
npm install -g @robot-resources/scraper-mcp
Claude Desktop Configuration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"scraper": {
"command": "npx",
"args": ["-y", "@robot-resources/scraper-mcp"]
}
}
}
Tools
scraper_compress_url
Compress a single web page into markdown with 70-90% fewer tokens.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | yes | — | URL to compress |
mode |
string | no | 'auto' |
'fast', 'stealth', 'render', or 'auto' |
timeout |
number | no | 10000 |
Fetch timeout in milliseconds |
maxRetries |
number | no | 3 |
Max retry attempts (0-10) |
Example prompt: "Compress https://docs.example.com/getting-started"
scraper_crawl_url
Crawl multiple pages from a starting URL using BFS link discovery.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | yes | — | Starting URL to crawl |
maxPages |
number | no | 10 |
Max pages to crawl (1-100) |
maxDepth |
number | no | 2 |
Max link depth (0-5) |
mode |
string | no | 'auto' |
'fast', 'stealth', 'render', or 'auto' |
include |
string[] | no | — | URL patterns to include (glob) |
exclude |
string[] | no | — | URL patterns to exclude (glob) |
timeout |
number | no | 10000 |
Per-page timeout in milliseconds |
Example prompt: "Crawl the docs at https://docs.example.com with max 20 pages"
Fetch Modes
| Mode | How | Use when |
|---|---|---|
'fast' |
Plain HTTP | Default sites, APIs, docs |
'stealth' |
TLS fingerprint impersonation | Anti-bot protected sites |
'render' |
Headless browser (Playwright) | JS-rendered SPAs |
'auto' |
Fast → stealth fallback on 403/challenge | Unknown sites (default) |
Stealth requires impit and render requires playwright as peer dependencies of @robot-resources/scraper.
Requirements
- Node.js 18+
Related
- @robot-resources/scraper - Core compression library
- @robot-resources/router-mcp - MCP server for LLM cost optimization
- Robot Resources - Human Resources, but for your AI agents
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.