scribe-me
MCP server to scrape web pages to clean Markdown via headless Chromium, with support for single or batch URLs.
README
scribe-me
Scrape web pages to clean Markdown via headless Chromium. Available as a standalone CLI tool and as a Claude Code MCP tool + skill.
Setup
npm install
npx playwright install chromium
npm link
CLI Usage
Single URL
scribe-me -p <project> -u <url> [-c <class>]
Batch (file of URLs)
scribe-me -p <project> -f <file> [-c <class>]
The file is plain text with one URL per line. Lines starting with # are ignored. The scraper runs 3 pages concurrently against a single shared Chromium instance.
| Flag | Description |
|---|---|
-p, --project |
Project name — output is organized under scribe-me/<project>/ |
-u, --url |
Single URL to scrape |
-f, --file |
Path to a file with one URL per line (use instead of -u) |
-c, --class |
CSS class of the content container to extract |
Output files are written to scribe-me/<project>/ with timestamped filenames:
scribe-me/freewheel/2026-02-27 03:53:23-Getting-Started-with-the-Buzz-API.md
Container Detection
If -c is provided, the scraper targets that class. Otherwise it walks a fallback list of common content selectors (main, article, [role="main"], #content, etc.) and picks the first with meaningful text content, falling back to <body>.
MCP Server
The same scraping logic is exposed as MCP tools for use inside Claude Code:
scrape-to-markdown— scrape a single URLscrape-batch-to-markdown— scrape an array of URLs (3 concurrent, shared browser)
Add to ~/.claude/settings.json:
{
"mcpServers": {
"scribe-me": {
"command": "node",
"args": ["/absolute/path/to/src/mcp-server.js"]
}
}
}
Claude Code Skill
The /scrape skill chains the MCP tool with an AI cleanup pass. It scrapes the page, then has Claude clean up the resulting markdown — removing UI artifacts ("Suggest Edits" links, "Did this page help you?" prompts, empty heading anchors, ToC blocks), fixing malformed headings, and improving code block formatting.
/scrape joshlehman https://www.joshlehman.com -c content
If any arguments are missing, the skill prompts for them interactively.
Skill Installation
Copy the skill prompt to your Claude Code commands directory:
cp commands/scrape.md ~/.claude/commands/scrape.md
Architecture
bin/
scribe-me.js # CLI entry point (Commander)
src/
scraper.js # Core: Playwright + Turndown
container-finder.js # Heuristic content container detection
file-writer.js # Directory creation + timestamped file writing
sanitize.js # Filename sanitization
mcp-server.js # MCP server wrapping scraper
commands/
scrape.md # Claude Code /scrape skill prompt
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.