scpr

scpr

A web scraping CLI tool that converts web pages to markdown and also runs as an MCP server for agent integration.

Category
Visit Server

README

scpr

scpr is a simple and straightforward webscraping CLI tool made to scrape page as markdown content, and developed to be used both by humans and by coding agents (either as an MCP server or as a skill).

scpr is written in Go and based on colly for web scraping and html-to-markdown for converting HTML pages to markdown.

Installation

Install with Go (v1.24+ required):

go install github.com/AstraBert/scpr

Install with NPM:

npm install @cle-does-things/scpr

Extra instructions for Windows installation

If you are on Windows, scpr might not be available right after global installation with npm. In that case, you might need to take extra steps:

  1. Find where the node executable is stored on your machine:
Get-Command node

This will print the directory where node.exe is stored: scpr will be installed at .\bin\scpr.exe in that folder.

[!NOTE]

If you are using nvm for Windows, node.exe will be at C:\Users\nvm4w\nodejs

  1. Add {NODE_FOLDER}\bin (in the case of nvm: C:\Users\nvm4w\nodejs\bin) to the PATH environment variables. Follow this guide for instructions on how to set PATH env variables.
  2. Restart your computer
  3. Test scpr --help from your terminal. The execution might be challenged by your antivirus, but, since the executable does not contain any harmful code, the antivirus will eventually allow it

Usage

As a CLI tool

Basic usage (scrape a single page):

scpr --url https://example.com --output ./scraped

This will scrape the page and save it as a markdown file in the ./scraped folder.

Recursive scraping

To scrape a page and all linked pages within the same domain:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 3

Parallel scraping

Speed up recursive scraping with multiple threads:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 2 --parallel 5

Additional options

  • --log - Set logging level (info, debug, warn, error)
  • --max - Maximum depth of pages to follow (default: 1)
  • --parallel - Number of concurrent threads (default: 1)
  • --allowed - Allowed domains for recursive scraping (can be specified multiple times)

For more details, run:

scpr --help

As a stdio MCP server

Start the MCP server with:

scpr mcp

And configure it in agents using:

{
  "mcpServers": {
    "web-scraping": {
      "type": "stdio",
      "command": "scpr",
      "args": [
        "mcp"
      ],
      "env": {}
    }
  }
}

The above JSON snippet is reported as used by Claude Code, adapt it to your agent before using it

Contributing

Contributions are welcome! Please read the Contributing Guide to get started.

License

This project is licensed under the MIT License

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured