FetchV2 MCP Server
Model Context Protocol (MCP) server for web content fetching and extraction. Enables fetching webpages, extracting clean content using Trafilatura, discovering links, and batch fetching up to 10 URLs.
README
FetchV2 MCP Server
Model Context Protocol (MCP) server for web content fetching and extraction.
This MCP server provides tools to fetch webpages, extract clean content using Trafilatura, and discover links for batch processing.
Features
- Fetch Webpages: Extract clean markdown content from any URL
- Batch Fetching: Fetch up to 10 URLs in a single request
- Link Discovery: Find and filter links on any webpage
- llms.txt Support: Parse and fetch LLM-friendly documentation indexes
- Smart Extraction: Trafilatura removes boilerplate (navbars, ads, footers)
- Robots.txt Compliance: Respects robots.txt with graceful timeout handling
- Pagination Support: Handle large pages with
start_indexparameter
Prerequisites
- Install
uvfrom Astral - Install Python 3.10 or newer using
uv python install 3.10
Installation
| Cursor | VS Code |
|---|---|
| Install MCP Server | Install on VS Code |
Or configure manually in your MCP client:
{
"mcpServers": {
"fetchv2": {
"command": "uvx",
"args": ["fetchv2-mcp-server@latest"],
"disabled": false,
"autoApprove": []
}
}
}
Config file locations:
- Claude Desktop (macOS):
~/Library/Application Support/Claude/claude_desktop_config.json - Claude Desktop (Windows):
%APPDATA%\Claude\claude_desktop_config.json - Windsurf:
~/.codeium/windsurf/mcp_config.json - Kiro:
.kiro/settings/mcp.jsonin your project
Install from PyPI
# Using uv
uv add fetchv2-mcp-server
# Using pip
pip install fetchv2-mcp-server
Basic Usage
Example prompts to try:
- "Fetch the documentation from
<URL>" - "Find all links on
<docs URL>that contain 'tutorial'" - "Read these three pages and summarize the differences:
[url1, url2, url3]"
Available Tools
fetch
Fetches a webpage and extracts its main content as clean markdown.
fetch(url: str, max_length: int = 5000, start_index: int = 0) -> str
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | The webpage URL to fetch |
max_length |
int | 5000 | Maximum characters to return |
start_index |
int | 0 | Character offset for pagination |
get_raw_html |
bool | false | Skip extraction, return raw HTML |
include_metadata |
bool | true | Include title, author, date |
include_tables |
bool | true | Preserve tables in markdown |
include_links |
bool | false | Preserve hyperlinks |
bypass_robots_txt |
bool | false | Skip robots.txt check |
fetch_batch
Fetches multiple webpages in a single request.
fetch_batch(urls: list[str], max_length_per_url: int = 2000) -> str
| Parameter | Type | Default | Description |
|---|---|---|---|
urls |
list[str] | required | List of URLs (max 10) |
max_length_per_url |
int | 2000 | Character limit per URL |
get_raw_html |
bool | false | Skip extraction for all URLs |
discover_links
Discovers all links on a webpage with optional filtering.
discover_links(url: str, filter_pattern: str = "") -> str
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | The webpage URL to scan |
filter_pattern |
str | "" | Regex to filter links (e.g., /docs/) |
fetch_llms_txt
Fetch and parse an llms.txt file to discover LLM-friendly documentation.
fetch_llms_txt(url: str, include_content: bool = False) -> str
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL to an llms.txt file |
include_content |
bool | false | Also fetch content of all linked pages |
max_length_per_url |
int | 2000 | When include_content=True, max chars per page |
⚠️ Important: By default, only the llms.txt index is fetched — the linked markdown files are NOT downloaded to context. Set
include_content=Trueto explicitly fetch all linked pages.
Example:
# DEFAULT: Only fetches the index (lightweight, ~1KB)
fetch_llms_txt(url="https://docs.example.com/llms.txt")
# Returns: title + list of links with descriptions
# EXPLICIT: Fetches index + all linked .md files (can be large)
fetch_llms_txt(url="https://docs.example.com/llms.txt", include_content=True)
# Returns: structure + content of all linked pages
Note: Relative URLs (e.g., /docs/guide.md) are automatically resolved to absolute URLs.
Workflow Example
Step 1: Discover relevant documentation pages
discover_links(url="https://docs.example.com/", filter_pattern="/guide/")
Step 2: Batch fetch the pages you need
fetch_batch(urls=["https://docs.example.com/guide/intro", "https://docs.example.com/guide/setup"])
Prompts
- fetch_manual - User-initiated fetch that bypasses robots.txt
- research_topic - Research a topic by fetching multiple relevant URLs
Development
# Clone and install
git clone https://github.com/praveenc/fetchv2-mcp-server.git
cd fetchv2-mcp-server
uv sync --dev
source .venv/bin/activate
# Run tests
uv run pytest
# Run with MCP Inspector
mcp dev src/fetchv2_mcp_server/server.py
# Linting and type checking
uv run ruff check .
uv run pyright
License
MIT - see LICENSE for details.
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Support
For issues and questions, use the GitHub issue tracker.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.