website-reader-mcp

website-reader-mcp

Enables AI chat backends to fetch public webpages and return cleaned, readable text, with tools for simple fetching and article extraction.

Category
Visit Server

README

Website Reader MCP

A small production-ready Model Context Protocol (MCP) server built with Python and FastAPI. It exposes Website Reader tools over Streamable HTTP so an AI chat backend can fetch public webpages and receive cleaned, readable text.

What it does

  • Runs as a FastAPI app locally with uvicorn over HTTPS
  • Deploys to Vercel as a Python serverless app (HTTPS provided by Vercel)
  • Exposes MCP at /mcp (Streamable HTTP transport)
  • Protects the MCP endpoint with a static API key
  • Provides the fetch_url tool to fetch a public page and return structured metadata plus cleaned text
  • Provides the extract_article tool to extract higher quality article content with rich metadata

Tools: fetch_url vs extract_article

Tool Best for Output
fetch_url Raw/simple fetch when you also need HTTP status, final URL, and content type Cleaned page text plus basic title/description from HTML
extract_article Summaries, blog posts, news, docs, and long-form pages Article-focused text extracted with trafilatura, plus author, date, site name, language, and related metadata

Use extract_article when you want the main readable article body. Use fetch_url when you need fetch diagnostics or a simpler HTML-to-text pass.

Local setup

Requirements: Python 3.11+ and OpenSSL (for local dev certs)

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env

Edit .env and set a real value for MCP_API_KEY.

Environment variables

Copy .env.example to .env:

MCP_API_KEY=change-me
APP_ENV=local
REQUEST_TIMEOUT_SECONDS=12
MAX_RESPONSE_CHARS=12000
MAX_HTML_BYTES=2000000
ALLOWED_SCHEMES=https,http

HOST=0.0.0.0
PORT=8001
DEV_HTTPS=true
SSL_CERTFILE=certs/localhost.pem
SSL_KEYFILE=certs/localhost-key.pem

The real .env file is gitignored and should not be committed.

Create local HTTPS certs

Local development uses self-signed TLS certs. Generate them once:

chmod +x scripts/dev.sh scripts/generate_dev_certs.sh
./scripts/generate_dev_certs.sh

This creates:

certs/localhost.pem
certs/localhost-key.pem

These files are gitignored and are for local dev only.

You do not need to run this manually if you use ./scripts/dev.sh — it auto-generates missing certs on first start.

Optional: trusted local certs with mkcert

If you prefer browser- and client-trusted local certs instead of self-signed ones:

brew install mkcert
mkcert -install
mkdir -p certs
mkcert -cert-file certs/localhost.pem -key-file certs/localhost-key.pem localhost 127.0.0.1

Then use ./scripts/dev.sh as usual.

Run locally

./scripts/dev.sh

This starts uvicorn with reload on:

https://localhost:8001

Useful overrides:

# HTTP instead of HTTPS
DEV_HTTPS=false ./scripts/dev.sh

# Bind only to localhost
HOST=127.0.0.1 ./scripts/dev.sh

Health check

Self-signed certs require -k with curl:

curl -k https://localhost:8001/health

Example response:

{
  "status": "ok",
  "service": "website-reader-mcp"
}

MCP endpoint

The MCP Streamable HTTP endpoint is:

https://localhost:8001/mcp

Authentication is required. Use either header:

Authorization: Bearer <MCP_API_KEY>

or:

X-API-Key: <MCP_API_KEY>

Quick MCP test with curl

Initialize a session (stateless mode):

curl -k -sS -X POST "https://localhost:8001/mcp/" \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {},
      "clientInfo": {"name": "curl-test", "version": "0.1"}
    }
  }'

List tools:

curl -k -sS -X POST "https://localhost:8001/mcp/" \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/list",
    "params": {}
  }'

Call fetch_url:

curl -k -sS -X POST "https://localhost:8001/mcp/" \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "fetch_url",
      "arguments": {"url": "https://example.com"}
    }
  }'

Replace change-me with your configured MCP_API_KEY.

Call extract_article

curl -k -sS -X POST "https://localhost:8001/mcp/" \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{
    "jsonrpc": "2.0",
    "id": 4,
    "method": "tools/call",
    "params": {
      "name": "extract_article",
      "arguments": {
        "url": "https://example.com/blog/my-article",
        "max_chars": 12000,
        "include_metadata": true
      }
    }
  }'

Example structured output:

{
  "url": "https://example.com/blog/my-article",
  "title": "My Article",
  "author": null,
  "date": null,
  "description": "Short article description",
  "site_name": "Example",
  "language": "en",
  "text": "Clean readable article text...",
  "text_length": 8452,
  "truncated": false,
  "extraction_method": "trafilatura"
}

If extraction fails, the tool returns a structured error instead of crashing:

{
  "url": "https://example.com/article",
  "error": "Could not extract readable article content from this page.",
  "text": null,
  "extraction_method": "trafilatura"
}

You can also connect with the MCP Inspector using Streamable HTTP transport, the HTTPS URL above, and the same API key. You may need to accept the self-signed certificate in your client.

Tests

pytest

Vercel deployment

  1. Push this repository to GitHub.
  2. Import the project in Vercel.
  3. Set environment variables in the Vercel dashboard (at minimum MCP_API_KEY).
  4. Deploy.

The included vercel.json routes all requests to app/main.py, which exports the ASGI app object required by @vercel/python. Vercel terminates HTTPS for you in production; the local cert files are not used there.

After deployment, your MCP endpoint will be:

https://<your-project>.vercel.app/mcp

Use the same API key headers as in local development.

Security notes and limitations

  • The MCP endpoint is protected by a single static API key. Rotate the key if it is exposed.
  • Local HTTPS uses self-signed certificates. Do not reuse them outside local development.
  • SSRF protection blocks localhost, common internal hostnames, and private/link-local/multicast IP literals before fetching.
  • DNS resolution is not yet validated against resolved private IPs (see TODO in app/services/fetcher.py).
  • Only http and https URLs are allowed.
  • Responses are capped by MAX_HTML_BYTES while downloading and MAX_RESPONSE_CHARS (or max_chars) for returned text.
  • No JavaScript rendering: pages that require a browser will not be fully readable.
  • No crawling, caching, or rate limiting yet.

Project structure

app/
  main.py              FastAPI app, health routes, MCP mount
  config.py            Environment settings
  auth.py              API key middleware
  schemas.py           Response models
  tools/
    website_reader.py  MCP tool registration
  services/
    fetcher.py         HTTP fetch + URL validation
    extractor.py       HTML to readable text (BeautifulSoup, used by fetch_url)
    article_extractor.py  Article extraction with trafilatura
scripts/
  generate_dev_certs.sh  Create local self-signed TLS certs
  dev.sh                 Run uvicorn with HTTPS locally
tests/
  test_fetcher.py
  test_extractor.py
  test_extract_article.py

Next steps

Possible follow-ups:

  • add fetch_markdown
  • add domain allowlist or blocklist
  • add caching
  • add rate limiting
  • add logging and request IDs
  • add an MCP client inside the existing AI Chat backend
  • add tools for search_web and read_url
  • validate DNS-resolved IPs before fetching (stronger SSRF protection)

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured