
Doc Scraper MCP Server
Converts web-based documentation into markdown format using jina.ai's conversion service, allowing users to scrape documentation from any URL and save it as markdown files.
askjohngeorge
README
Doc Scraper MCP Server
A Model Context Protocol (MCP) server that provides documentation scraping functionality. This server converts web-based documentation into markdown format using jina.ai's conversion service.
Features
- Scrapes documentation from any web URL
- Converts HTML documentation to markdown format
- Saves the converted documentation to a specified output path
- Integrates with the Model Context Protocol (MCP)
Installation
Installing via Smithery
To install Doc Scraper for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @askjohngeorge/mcp-doc-scraper --client claude
- Clone the repository:
git clone https://github.com/askjohngeorge/mcp-doc-scraper.git
cd mcp-doc-scraper
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the dependencies:
pip install -e .
Usage
The server can be run using Python:
python -m mcp_doc_scraper
Tool Description
The server provides a single tool:
- Name:
scrape_docs
- Description: Scrape documentation from a URL and save as markdown
- Input Parameters:
url
: The URL of the documentation to scrapeoutput_path
: The path where the markdown file should be saved
Project Structure
doc_scraper/
├── __init__.py
├── __main__.py
└── server.py
Dependencies
- aiohttp
- mcp
- pydantic
Development
To set up the development environment:
- Install development dependencies:
pip install -r requirements.txt
- The server uses the Model Context Protocol. Make sure to familiarize yourself with MCP documentation.
License
MIT License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Excel MCP Server
A Model Context Protocol server that enables AI assistants to read from and write to Microsoft Excel files, supporting formats like xlsx, xlsm, xltx, and xltm.
Playwright MCP Server
Provides a server utilizing Model Context Protocol to enable human-like browser automation with Playwright, allowing control over browser actions such as navigation, element interaction, and scrolling.
@kazuph/mcp-fetch
Model Context Protocol server for fetching web content and processing images. This allows Claude Desktop (or any MCP client) to fetch web content and handle images appropriately.
Claude Code MCP
An implementation of Claude Code as a Model Context Protocol server that enables using Claude's software engineering capabilities (code generation, editing, reviewing, and file operations) through the standardized MCP interface.
Apple MCP Server
Enables interaction with Apple apps like Messages, Notes, and Contacts through the MCP protocol to send messages, search, and open app content using natural language.
DuckDuckGo MCP Server
A Model Context Protocol (MCP) server that provides web search capabilities through DuckDuckGo, with additional features for content fetching and parsing.
YouTube Transcript MCP Server
This server retrieves transcripts for given YouTube video URLs, enabling integration with Goose CLI or Goose Desktop for transcript extraction and processing.
serper-search-scrape-mcp-server
This Serper MCP Server supports search and webpage scraping, and all the most recent parameters introduced by the Serper API, like location.