markfetch

markfetch

Fetches any URL and returns clean markdown, using a real Chrome fingerprint to bypass bot detection. Integrates as an MCP server with tools like fetch_markdown.

Category
Visit Server

README

markfetch

Reader View for AI agents and your shell. Fetch any URL, get back clean markdown — with a real Chrome's request fingerprint, not curl's.

npm ci node license

The built-in fetch tools that ship with AI coding agents return raw HTML, broken markdown, or 403 from Cloudflare more often than you'd like. markfetch sends HTTP/2 with a coherent Chrome header set so bot-detection systems see a real browser, then runs the response through the same Reader View pipeline your browser uses (Mozilla's Readability → turndown). The output is markdown indistinguishable from a human running "Save as Markdown" — on sites that would block a naive curl.

One command, two surfaces:

  • CLI — pass a URL. Print to stdout or -o to a file.
npm i -g markfetch

markfetch https://en.wikipedia.org/wiki/Markdown

  • MCP stdio server — bare invocation. Drop into Claude Desktop / Claude Code / Cursor / Goose / any stdio-MCP client.
{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"]
    }
  }
}

That snippet is the whole MCP setup — or jump to CLI usage to drive the same command from a shell.

MCP install commands

Claude Code

claude mcp add --scope user markfetch -- npx -y markfetch

Codex

codex mcp add markfetch -- npx -y markfetch

Gemini CLI

gemini mcp add -s user markfetch npx -y markfetch

Why markfetch?

Real-browser fingerprint Reader-View extraction Structured errors Zero config
Built-in agent fetch tools
Generic Playwright / Puppeteer
mcp-server-fetch (Python) basic
CloudFlare /markdown paid
markfetch ✓ (8 codes)
  • Real-browser HTTP/2 + Chrome fingerprint. ALPN-negotiated h2, User-Agent, Sec-CH-UA-*, Sec-Fetch-*, Accept-*. A Chrome UA with no client hints is a stronger automation signal than curl — markfetch sends the full coherent set, derived from the UA at startup so an override stays internally consistent.

  • Reader-View-quality extraction. linkedom@mozilla/readabilityturndown with GFM tables, strikethrough, and task lists. Code fences preserve language-X hints. Sphinx-style bare <pre> blocks render as code, not escaped prose. Intraword underscores stay un-escaped — no more list\_tools.

  • One tool, one shape (MCP). fetch_markdown(url, savePath?) returns markdown in content[0].text. No structuredContent, no frontmatter, no metadata fields. Several major MCP clients (Claude Code CLI, VS Code/Copilot) forward only structuredContent to the model and drop content[] when both are present — markfetch deliberately stays on the channel your LLM can actually read.

  • savePath / -o escape valve. Pass an absolute path (MCP savePath) or -o <path> (CLI) and the markdown lands on disk instead of the response channel. Use it when your client's inline tool-result cap would truncate large responses, or to redirect output from a shell pipeline. The file is only ever the markdown of the URL — fetch errors return a [code] string and never touch the disk.

  • Whole document or honest failure. No pagination, no truncation. If the document doesn't fit in MARKFETCH_MAX_BYTES, you get too_large — never a half-truth.

  • Stdio-clean. Stdout is reserved for MCP frames. Stderr is fatal-only. No log spam, no ANSI escapes — keeping stderr parseable for shell consumers.

  • Pure Node, no subprocesses. No Playwright, no headless Chromium, no Python hop. Single Node process — one Node process whether you invoke it as an MCP server or from the shell.

CLI usage

markfetch doubles as a shell tool: when invoked with at least one argument it parses argv as a CLI instead of starting the MCP server. Bare invocation (zero args) keeps the existing MCP-server behavior — every MCP client config in the wild keeps working unchanged.

# Print clean markdown to stdout
npx -y markfetch https://example.com/article

# Save to a file (absolute or relative path)
npx -y markfetch https://example.com/article -o article.md

# Pipe into another tool
npx -y markfetch https://example.com/article | pandoc -o article.pdf

For repeat use, install once:

npm i -g markfetch         # then anywhere: markfetch <url>
# or, as a project devDependency
npm i -D markfetch         # then in package.json scripts: "markfetch <url>"

Flags:

Flag Purpose
-o, --output <path> Save markdown to a file (absolute or relative path). Default is stdout.
-V, --version Print version and exit.
-h, --help Print usage and exit.

Errors go to stderr with the same [code] message shape the MCP tool returns (see the table below), and the process exits with a non-zero status. The same env vars (MARKFETCH_TIMEOUT_MS, MARKFETCH_MAX_BYTES, MARKFETCH_USER_AGENT) apply in both modes. MARKFETCH_ALLOWED_WRITE_ROOTS is MCP-only — see Write sandbox.

Errors carry one of eight deterministic codes:

Code Meaning
network_error DNS / TCP / TLS failure, or an unexpected internal error from the fetcher.
http_error Upstream returned a non-2xx status.
timeout Per-request budget MARKFETCH_TIMEOUT_MS exceeded.
unsupported_content_type Response was not text/html or application/xhtml+xml.
extraction_failed Readability returned no article content (typical for pure client-rendered SPAs).
too_large Response body or extracted markdown exceeded MARKFETCH_MAX_BYTES.
save_failed savePath was given but writeFile failed (parent directory missing, permission denied, etc.).
save_forbidden savePath resolves outside the allowed write roots — see Write sandbox. MCP-only; the CLI has no sandbox.

What it is not

  • Not a crawler. No recursion, no robots.txt parsing, no rate-limit orchestration. One URL in, one document out.
  • Not authenticated. Anonymous fetch only — no cookie jar, no auth headers, no session reuse. Pages behind login walls return whatever the public response is, usually surfaced as http_error.
  • Not a JS renderer. Pure client-rendered SPAs with no static content return extraction_failed. SPAs with server-rendered or SEO-prerendered HTML will extract whatever static content they ship.

Configuration

Variable Default Purpose
MARKFETCH_TIMEOUT_MS 30000 Per-request timeout in ms
MARKFETCH_MAX_BYTES 5000000 Cap on response body and extracted markdown
MARKFETCH_USER_AGENT Pinned Chrome 130 string Override the UA. Must be a Chrome UA — Sec-CH-UA-* client hints are derived from it at startup; non-Chrome strings fail fast
MARKFETCH_ALLOWED_WRITE_ROOTS os.tmpdir() + process.cwd() MCP-only. Path-delimiter-separated list of absolute paths permitted as MCP savePath write roots. Replaces the defaults entirely — see Write sandbox

Pass overrides via the env block of your MCP client config:

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"],
      "env": {
        "MARKFETCH_TIMEOUT_MS": "60000"
      }
    }
  }
}

Write sandbox

MCP savePath writes are confined to a set of allowed root directories. By default the allowed set is os.tmpdir()process.cwd() (each resolved via fs.realpath once at startup). A savePath outside that set returns save_forbidden and no file is created.

Override the default set with MARKFETCH_ALLOWED_WRITE_ROOTS — a list of absolute paths separated by the platform's path delimiter (: on POSIX, ; on Windows). When set, the override replaces the defaults entirely — it does not merge. To keep os.tmpdir() or process.cwd() accessible, list them yourself; the example below shows /tmp for that reason. A malformed value (non-absolute entry, or a directory that doesn't exist) fails fast on stderr at startup.

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"],
      "env": {
        "MARKFETCH_ALLOWED_WRITE_ROOTS": "/Users/me/markfetch-out:/tmp"
      }
    }
  }
}

On Windows, use backslashes and ; as the delimiter:

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"],
      "env": {
        "MARKFETCH_ALLOWED_WRITE_ROOTS": "C:\\Users\\me\\markfetch-out;C:\\Users\\me\\AppData\\Local\\Temp"
      }
    }
  }
}

Notes:

  • The sandbox is MCP-only by design. The CLI is unrestricted — a human at the shell is the security boundary, and the markfetch CLI doesn't run any sandbox check at all. The asymmetry exists because the MCP tool is driven by a language model, which may be steered by content from a page it just fetched.
  • Symlinks pointing outside are blocked. Each candidate savePath is resolved via fs.realpath to its real destination before the containment check, so a symlink planted inside the sandbox cannot be used to escape.
  • Containment is case-insensitive on Windows (C:\Users\Bob and c:\users\bob are the same path).

Develop

Requires Node.js ≥ 24. Tested on Linux, macOS, and Windows in CI.

When iterating on CLI changes, tsx src/index.ts <url> and tsx src/index.ts --help route through the same argv-discriminated dispatcher as the built dist/index.js — no rebuild needed between edits.

To point an MCP client at a local source build, swap npx for node + an absolute path to dist/index.js:

{
  "mcpServers": {
    "markfetch": {
      "command": "node",
      "args": ["/absolute/path/to/markfetch/dist/index.js"]
    }
  }
}

Responsible use

markfetch is a per-call fetch tool, not a crawler. Use it on URLs whose targets you have permission to fetch, and respect the terms of service of any site you query. The maintainer assumes no liability for misuse — see LICENSE.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured