smart-webfetch-mcp

smart-webfetch-mcp

Context-aware web fetching for LLMs, providing 7 tools to check page size, fetch with truncation, extract code/sections/links/tables, and paginate large documents.

Category
Visit Server

README

Smart WebFetch MCP Server

PyPI version PyPI downloads Python version License: MIT

Context-aware web fetching for LLMs. Prevents context window flooding by checking page size before fetching and providing surgical extraction tools.

The Problem

Standard web fetch tools dump entire pages into the context window, often:

  • Exceeding token limits
  • Wasting context on navigation, footers, ads
  • Flooding the model with irrelevant content

The Solution

Smart WebFetch provides 7 tools for intelligent web fetching:

Tool Purpose
web_preflight Check page size before fetching
web_smart_fetch Fetch with automatic truncation
web_fetch_code Extract only code blocks
web_fetch_section Fetch specific heading/section
web_fetch_chunked Paginated fetching for large docs
web_fetch_links Extract all links from a page
web_fetch_tables Extract tables as markdown

Installation

# Install from PyPI
pip install smart-webfetch-mcp

# Or with uvx (recommended for MCP)
uvx smart-webfetch-mcp

Configuration

Claude Code

claude mcp add --transport stdio smart-webfetch -- uvx smart-webfetch-mcp

OpenCode

Add to your opencode.json:

{
  "mcp": {
    "smart-webfetch": {
      "type": "local",
      "command": ["uvx", "smart-webfetch-mcp"],
      "enabled": true
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "smart-webfetch": {
      "command": "uvx",
      "args": ["smart-webfetch-mcp"]
    }
  }
}

Usage Examples

Check before fetching

Use web_preflight to check https://docs.python.org/3/library/asyncio.html

Response:

{
  "url": "https://docs.python.org/3/library/asyncio.html",
  "estimated_tokens": 45000,
  "safe_for_context": false,
  "recommendation": "Very large page (~45,000 tokens). Use web_fetch_section or web_fetch_chunked."
}

Fetch with automatic truncation

Use web_smart_fetch on https://example.com/docs with max_tokens=4000

Extract only code examples

Use web_fetch_code on https://docs.python.org/3/library/asyncio-task.html

Get specific section

Use web_fetch_section on https://docs.python.org/3/library/asyncio.html 
with heading="Running an asyncio Program"

Paginated reading

Use web_fetch_chunked on https://large-docs.com/api with chunk=0, chunk_size=4000

Then continue with chunk=1, chunk=2, etc.

Tool Reference

web_preflight

Check page metadata before fetching.

Parameters:

  • url (required): URL to check

Returns:

  • estimated_tokens: Approximate token count
  • content_type: MIME type
  • is_html: Whether content is HTML
  • title: Page title (if HTML)
  • safe_for_context: Boolean (true if < 8000 tokens)
  • recommendation: Human-readable advice

web_smart_fetch

Fetch with automatic truncation for large pages.

Parameters:

  • url (required): URL to fetch
  • max_tokens (optional, default 8000): Maximum tokens to return
  • strategy (optional, default "auto"): "auto" finds natural break points, "truncate" hard cuts

Returns: Markdown content with metadata header

web_fetch_code

Extract only code blocks from a page.

Parameters:

  • url (required): URL to extract code from

Returns: Code blocks with language annotations and context

web_fetch_section

Fetch content under a specific heading.

Parameters:

  • url (required): URL to fetch from
  • heading (required): Heading text to find (case-insensitive)

Returns: Section content or list of available sections if not found

web_fetch_chunked

Fetch large documents in chunks.

Parameters:

  • url (required): URL to fetch
  • chunk (optional, default 0): Chunk index (0-based)
  • chunk_size (optional, default 4000): Tokens per chunk

Returns: Chunk content with navigation metadata

web_fetch_links

Extract all links from a page.

Parameters:

  • url (required): URL to extract links from
  • filter_pattern (optional): Regex to filter link URLs
  • external_only (optional, default false): Only return external links

Returns: Markdown list of links with text and URL

web_fetch_tables

Extract tables from a page as markdown.

Parameters:

  • url (required): URL to extract tables from
  • table_index (optional): Specific table index (0-based), returns all if not specified

Returns: Markdown formatted tables

Development

# Clone and install dev dependencies
git clone https://github.com/mathisto/smart-webfetch-mcp
cd smart-webfetch-mcp
pip install -e ".[dev]"

# Run tests
pytest

# Format code
ruff format .
ruff check --fix .

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured