scpr
A web scraping CLI tool that converts web pages to markdown and also runs as an MCP server for agent integration.
README
scpr
scpr is a simple and straightforward webscraping CLI tool made to scrape page as markdown content, and developed to be used both by humans and by coding agents (either as an MCP server or as a skill).
scpr is written in Go and based on colly for web scraping and html-to-markdown for converting HTML pages to markdown.
Installation
Install with Go (v1.24+ required):
go install github.com/AstraBert/scpr
Install with NPM:
npm install @cle-does-things/scpr
Extra instructions for Windows installation
If you are on Windows, scpr might not be available right after global installation with npm. In that case, you might need to take extra steps:
- Find where the
nodeexecutable is stored on your machine:
Get-Command node
This will print the directory where node.exe is stored: scpr will be installed at .\bin\scpr.exe in that folder.
[!NOTE]
If you are using
nvmfor Windows,node.exewill be atC:\Users\nvm4w\nodejs
- Add
{NODE_FOLDER}\bin(in the case of nvm:C:\Users\nvm4w\nodejs\bin) to the PATH environment variables. Follow this guide for instructions on how to set PATH env variables. - Restart your computer
- Test
scpr --helpfrom your terminal. The execution might be challenged by your antivirus, but, since the executable does not contain any harmful code, the antivirus will eventually allow it
Usage
As a CLI tool
Basic usage (scrape a single page):
scpr --url https://example.com --output ./scraped
This will scrape the page and save it as a markdown file in the ./scraped folder.
Recursive scraping
To scrape a page and all linked pages within the same domain:
scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 3
Parallel scraping
Speed up recursive scraping with multiple threads:
scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 2 --parallel 5
Additional options
--log- Set logging level (info, debug, warn, error)--max- Maximum depth of pages to follow (default: 1)--parallel- Number of concurrent threads (default: 1)--allowed- Allowed domains for recursive scraping (can be specified multiple times)
For more details, run:
scpr --help
As a stdio MCP server
Start the MCP server with:
scpr mcp
And configure it in agents using:
{
"mcpServers": {
"web-scraping": {
"type": "stdio",
"command": "scpr",
"args": [
"mcp"
],
"env": {}
}
}
}
The above JSON snippet is reported as used by Claude Code, adapt it to your agent before using it
Contributing
Contributions are welcome! Please read the Contributing Guide to get started.
License
This project is licensed under the MIT License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.