MCP Servers

read-website-fast

Fast, token-efficient web content extraction tool that converts websites to clean Markdown for AI agents, featuring smart caching, content extraction with Mozilla Readability, and polite crawling capabilities.

README

@just-every/mcp-read-website-fast

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

Overview

Existing MCP web crawlers are slow and consume large quantities of tokens. This pauses the development process and provides incomplete results as LLMs need to parse whole web pages.

This MCP package fetches web pages locally, strips noise, and converts content to clean Markdown while preserving links. Designed for Claude Code, IDEs and LLM pipelines with minimal token footprint. Crawl sites locally with minimal dependencies.

Features

Fast startup using official MCP SDK with lazy loading for optimal performance
Content extraction using Mozilla Readability (same as Firefox Reader View)
HTML to Markdown conversion with Turndown + GFM support
Smart caching with SHA-256 hashed URLs
Polite crawling with robots.txt support and rate limiting
Concurrent fetching with configurable depth crawling
Stream-first design for low memory usage
Link preservation for knowledge graphs
Optional chunking for downstream processing

Installation

Claude Code

claude mcp add read-website-fast -s user -- npx -y @just-every/mcp-read-website-fast

VS Code

code --add-mcp '{"name":"read-website-fast","command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}'

Cursor

cursor://anysphere.cursor-deeplink/mcp/install?name=read-website-fast&config=eyJyZWFkLXdlYnNpdGUtZmFzdCI6eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqdXN0LWV2ZXJ5L21jcC1yZWFkLXdlYnNpdGUtZmFzdCJdfX0=

JetBrains IDEs

Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add

Choose “As JSON” and paste:

{"command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}

Or, in the chat window, type /add and fill in the same JSON—both paths land the server in a single step.

Raw JSON (works in any MCP client)

{
  "mcpServers": {
    "read-website-fast": {
      "command": "npx",
      "args": ["-y", "@just-every/mcp-read-website-fast"]
    }
  }
}

Drop this into your client’s mcp.json (e.g. .vscode/mcp.json, ~/.cursor/mcp.json, or .mcp.json for Claude).

Features

Fast startup using official MCP SDK with lazy loading for optimal performance
Content extraction using Mozilla Readability (same as Firefox Reader View)
HTML to Markdown conversion with Turndown + GFM support
Smart caching with SHA-256 hashed URLs
Polite crawling with robots.txt support and rate limiting
Concurrent fetching with configurable depth crawling
Stream-first design for low memory usage
Link preservation for knowledge graphs
Optional chunking for downstream processing

Available Tools

read_website_fast - Fetches a webpage and converts it to clean markdown
- Parameters:
  - url (required): The HTTP/HTTPS URL to fetch
  - depth (optional): Crawl depth (0 = single page)
  - respectRobots (optional): Whether to respect robots.txt

Available Resources

read-website-fast://status - Get cache statistics
read-website-fast://clear-cache - Clear the cache directory

Development Usage

Install

npm install
npm run build

Single page fetch

npm run dev fetch https://example.com/article

Crawl with depth

npm run dev fetch https://example.com --depth 2 --concurrency 5

Output formats

# Markdown only (default)
npm run dev fetch https://example.com

# JSON output with metadata
npm run dev fetch https://example.com --output json

# Both URL and markdown
npm run dev fetch https://example.com --output both

CLI Options

-d, --depth <number> - Crawl depth (0 = single page, default: 0)
-c, --concurrency <number> - Max concurrent requests (default: 3)
--no-robots - Ignore robots.txt
--all-origins - Allow cross-origin crawling
-u, --user-agent <string> - Custom user agent
--cache-dir <path> - Cache directory (default: .cache)
-t, --timeout <ms> - Request timeout in milliseconds (default: 30000)
-o, --output <format> - Output format: json, markdown, or both (default: markdown)

Clear cache

npm run dev clear-cache

Architecture

mcp/
├── src/
│   ├── crawler/        # URL fetching, queue management, robots.txt
│   ├── parser/         # DOM parsing, Readability, Turndown conversion
│   ├── cache/          # Disk-based caching with SHA-256 keys
│   ├── utils/          # Logger, chunker utilities
│   └── index.ts        # CLI entry point

Development

# Run in development mode
npm run dev fetch https://example.com

# Build for production
npm run build

# Run tests
npm test

# Type checking
npm run typecheck

# Linting
npm run lint

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

Troubleshooting

Cache Issues

npm run dev clear-cache

Timeout Errors

Increase timeout with -t flag
Check network connectivity
Verify URL is accessible

Content Not Extracted

Some sites block automated access
Try custom user agent with -u flag
Check if site requires JavaScript (not supported)

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured