WET - Web Extended Toolkit MCP Server

WET - Web Extended Toolkit MCP Server

Web search, content extraction, and library docs for AI agents with 5-strategy scraping and runs without API keys.

Category
Visit Server

README

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Web search, content extraction, and library docs for AI agents -- 5-strategy scraping, runs without API keys.

Phase Status Scope
Phase 1 Shipped web-core ScrapingAgent migration, smart chunks output, search polish, media slim
Phase 2 Shipped Context7-level docs search: library index (Tier 1 + Tier 2), version-aware queries with token cap, project lock (Cabinets)
Phase 3 Shipped extract.agent multi-step research with cited synthesis, extract.interact click/fill/submit via patchright (optional session persistence), docs_004_chunk_summaries migration, media.analyze removed (v2.0.0)

Current release: v3.x. media(action="analyze") was removed in the v2.0.0 BREAKING release. Use imagine-mcp's understand action for vision/audio/video analysis. See docs/migration.md for the upgrade recipe.

<!-- Badge Row 1: Status --> CI codecov PyPI Docker License: MIT

<!-- Badge Row 2: Tech --> Python SearXNG MCP semantic-release Renovate

<!-- BEGIN: AUTO-GENERATED-CROSS-PROMO --> <details> <summary><strong>Sister projects from n24q02m</strong> (click to expand)</summary>

Project Tagline Tag
better-code-review-graph Knowledge graph for token-efficient code reviews -- semantic search and call-... MCP
better-email-mcp IMAP/SMTP email for AI agents -- read, send, organize folders, and manage att... MCP
better-godot-mcp Composite MCP server for Godot Engine -- 17 composite tools for AI-assisted g... MCP
better-notion-mcp Markdown-first Notion for AI agents -- pages, databases, blocks, and comments... MCP
better-telegram-mcp Telegram for AI agents -- messages, chats, media, and contacts across both bo... MCP
claude-plugins Claude Code plugin marketplace for the n24q02m MCP servers -- install web sea... Marketplace
imagine-mcp Image and video understanding + generation for AI agents -- across Gemini, Op... MCP
jules-task-archiver Chrome Extension for bulk operations on Jules tasks via batchexecute API -- a... Tooling
mcp-core Shared foundation for building MCP servers -- Streamable HTTP transport, OAut... MCP
mnemo-mcp Persistent AI memory with hybrid search and embedded sync. Open, free, unlimi... MCP
qwen3-embed Lightweight Qwen3 text embedding and reranking via ONNX Runtime and GGUF Library
skret Secrets without the server. CLI
tacet TACET: a self-distilling neuro-symbolic cascade that amortises LLM cost in kn... Tooling
web-core Shared web infrastructure package for search, scraping, HTTP security, and st... Library
wet-mcp Open-source MCP server for AI agents: web search, content extraction, and lib... MCP

</details> <!-- END: AUTO-GENERATED-CROSS-PROMO -->

Table of contents

<a href="https://glama.ai/mcp/servers/n24q02m/wet-mcp"> <img width="380" height="200" src="https://glama.ai/mcp/servers/n24q02m/wet-mcp/badge" alt="WET MCP server" /> </a>

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with query expansion, TTL cache (1 h general / 5 min time-sensitive), standardized citation format, and 200-token snippet cap. Optional cloud search backends (Tavily, Brave, Exa) as a fallback chain via SEARCH_BACKENDS
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- 5-strategy escalation chain via n24q02m-web-core ScrapingAgent (basic_http -> tls_spoof -> headless Crawl4AI), markitdown bridge for low-tier HTML/MD fallback, smart chunks structured output (clean text + markdown + JSON-LD + code blocks + metadata), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List + download images / videos / audio files. analyze was removed in v2.0.0 -- use imagine-mcp.understand for vision/audio inference
  • Anti-bot -- Stealth strategies bypass Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere, xAI, Anthropic) selected per task via the EMBEDDING_MODELS / RERANK_MODELS / LLM_MODELS model chains for higher-quality vectors and LLM features
  • Sync -- Cross-machine sync of indexed docs via Google Drive (OAuth Device Code, no browser redirect)

Quick install

# Method 1 (default): plugin install via Claude Code
/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@n24q02m-plugins

# Method 2 (CLI): direct uvx invocation
claude mcp add wet -- uvx wet-mcp

# Method 3 (recommended for HTTP / multi-device / OAuth)
docker run -d --name wet-mcp-http -p 8084:8080 \
  -v wet-data:/data -e MCP_TRANSPORT=http \
  -e PUBLIC_URL=https://wet.example.com \
  n24q02m/wet-mcp:latest

Full setup matrices live at the canonical docs site mcp.n24q02m.com/servers/wet-mcp/setup/ and the paste-to-agent snippets at claude-plugins/plugins/wet-mcp/setup-with-agent.md (per Spec F single source of truth).

Configuration

wet runs zero-config out of the box: web search uses an embedded local SearXNG, and embedding/reranking fall back to the bundled local Qwen3 ONNX models when no cloud keys are set. For higher-quality results, point each task at a cloud model chain. All settings are plain environment variables (no app prefix) -- in the HTTP self-host mode they are entered through the browser setup form instead.

Model chains (CSV provider/model,provider/model; order = fallback). Leave a chain empty to use the local ONNX models (embedding/rerank) or to disable LLM features (LLM):

Env var Task Empty default
EMBEDDING_MODELS Embeddings for docs search Local Qwen3-Embedding ONNX
RERANK_MODELS Result reranking Local Qwen3-Reranker ONNX
LLM_MODELS extract(action="agent") synthesis LLM features disabled

Provider keys -- the provider is inferred from each model's prefix; supply the matching key (litellm <PROVIDER>_API_KEY convention):

Model prefix Key env var Get it at
jina_ai/ JINA_AI_API_KEY jina.ai/api-key
gemini/ GEMINI_API_KEY aistudio.google.com/apikey
openai/ (or bare) OPENAI_API_KEY platform.openai.com
cohere/ COHERE_API_KEY dashboard.cohere.com
xai/ XAI_API_KEY console.x.ai
anthropic/ ANTHROPIC_API_KEY console.anthropic.com

Any other litellm provider works via env passthrough -- see litellm provider docs for its key name.

Search backends -- SEARCH_BACKENDS (CSV, runtime fallback chain) over searxng (default, local) plus optional cloud providers tavily / brave / exa. Point at an external SearXNG with SEARXNG_URL. Cloud providers need TAVILY_API_KEY / BRAVE_API_KEY / EXA_API_KEY.

Docs sync -- SYNC_ENABLED (default true), GOOGLE_DRIVE_CLIENT_ID (required for sync), SYNC_FOLDER (default wet-mcp), SYNC_INTERVAL (default 300s). Sync uses Google Drive over the OAuth Device Code flow (no browser redirect).

HTTP self-host -- MCP_TRANSPORT=http, PUBLIC_URL=<your-domain>. The setup form is gated by MCP_RELAY_PASSWORD; multi-user deployments also require CREDENTIAL_SECRET (per-user vault key) and MCP_DCR_SERVER_SECRET.

Example stdio config (cloud chains):

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["wet-mcp"],
      "env": {
        "EMBEDDING_MODELS": "jina_ai/jina-embeddings-v5-text-small",
        "RERANK_MODELS": "jina_ai/jina-reranker-v3",
        "LLM_MODELS": "gemini/gemini-3-flash-preview",
        "JINA_AI_API_KEY": "jina_xxx",
        "GEMINI_API_KEY": "AIza_xxx"
      }
    }
  }
}

Status

Stable architecture with two transports: stdio (default, local) and HTTP (self-host, OAuth-gated). No daemon-bridge layer and no auto-spawn from stdio. The media.analyze action was removed in the v2.0.0 BREAKING release -- see docs/migration.md for the upgrade recipe. Current release line: v3.x.

Documentation

Full docs at mcp.n24q02m.com/servers/wet-mcp/setup/:

  • Setup -- install methods for Claude Code, Codex, Gemini CLI, Cursor, Windsurf, mcp.json
  • Modes overview -- stdio / local-relay / remote-relay / remote-oauth
  • Multi-user setup -- per-JWT-sub credential model

In-repo references (Spec F single source of truth: setup docs live in claude-plugins/plugins/wet-mcp/):

  • docs/ARCHITECTURE.md -- web-core ScrapingAgent integration, strategy chain, storage layout, LLM provider dispatch
  • docs/BENCHMARKS.md -- v1.x baseline coverage / latency placeholders + tier-1 fixture metrics

Install with AI agent -- paste this to your AI coding agent:

Install MCP server wet-mcp following the steps at https://raw.githubusercontent.com/n24q02m/claude-plugins/main/plugins/wet-mcp/setup-with-agent.md

Tools

6 MCP tools (3 domain + config + help + config__open_relay). The legacy setup tool merged into config action dispatch.

Tool Description
search Web (SearXNG metasearch), news, images, academic research (Scholar / arXiv / PubMed / CrossRef / Semantic Scholar / BASE), library docs (HyDE + FTS5), find similar pages. Includes docs_resolve (library name -> ranked id), docs_query (version-aware + topic + 5000-token cap), docs_lock_project (Cabinets project pin via pyproject / package.json / go.mod / Cargo.toml manifest detection).
extract URL -> smart chunks dict (clean_text + markdown + structured_data + code_blocks + metadata) via web-core 5-strategy chain. Batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion (PDF/DOCX/XLSX/PPTX/EPUB), structured extraction (JSON Schema)
media list (discover URLs from gallery pages), download (SSRF-safe). analyze was removed in v2.0.0 -- use imagine-mcp.understand instead
config status, set, cache_clear, docs_reindex, warmup, setup_sync, setup_status, setup_skip, setup_reset, setup_complete
help Per-tool documentation: search, extract, media, config
config__open_relay Re-trigger the zero-config relay setup flow (prints a fresh relay URL for the browser form). Registered via mcp-core's register_open_relay_tool so an LLM can restart setup without a manual restart.

Media boundary: For vision / audio understanding (image captioning, OCR, audio transcription, video summarization), use imagine-mcp. media.analyze was removed in wet v2.0.0 -- use imagine-mcp.understand instead.

Comparison

How wet-mcp stacks up against direct competitors in each pillar:

Capability wet-mcp Brave Search Tavily Firecrawl Context7
Web search Yes (SearXNG aggregation) Yes Yes No No
Extract URL Yes (5-strategy chain) No Yes (basic) Yes No
Media list / download Yes No No No No
Library docs search Yes (Tier 1 curated + Tier 2 on-demand, version-aware, Cabinets) No No No Yes
Academic research Yes (6 providers) No No No No
Self-hostable Yes No No No Yes
Free tier Yes (open source) Limited Limited Limited Yes

Security

  • SSRF prevention -- URL validation on crawl targets
  • Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
  • Error sanitization -- No credentials in error messages
  • File conversion sandboxing -- Optional CONVERT_ALLOWED_DIRS restriction

Build from Source

git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

Deploy to Cloudflare

Deploy to Cloudflare

Run your own single-user wet instance serverless on Cloudflare (Containers + D1 + Vectorize + KV).

Prerequisites: a Cloudflare account on the Workers Paid plan and the wrangler CLI.

  1. git clone https://github.com/n24q02m/wet-mcp && cd wet-mcp
  2. wrangler login
  3. Provision resources and apply the D1 schema:
    wrangler d1 create wet-docs
    wrangler d1 execute wet-docs --file migrations/0001_init_wet.sql --remote
    wrangler vectorize create wet-docs-vectors --dimensions 768 --metric cosine
    wrangler kv namespace create wet-kv
    
    Paste the returned IDs into wrangler.jsonc.
  4. Push the container image to your Cloudflare managed registry (CF Containers cannot pull from external registries directly), then set <YOUR_ACCOUNT_ID> in wrangler.jsonc:
    docker pull ghcr.io/n24q02m/wet-mcp:beta
    docker tag ghcr.io/n24q02m/wet-mcp:beta wet-mcp:beta
    wrangler containers push wet-mcp:beta   # prints registry.cloudflare.com/<ACCOUNT_ID>/wet-mcp:beta
    
  5. Set secrets (use SEARXNG_URL with basic-auth userinfo, e.g. https://user:pass@searxng.example.com, or TAVILY_API_KEY if you set SEARCH_BACKEND=tavily):
    wrangler secret put CREDENTIAL_SECRET
    wrangler secret put JINA_AI_API_KEY
    wrangler secret put GOOGLE_VERTEX_EXPRESS_API_KEY
    wrangler secret put XAI_API_KEY
    wrangler secret put MCP_RELAY_PASSWORD
    wrangler secret put MCP_DCR_SERVER_SECRET
    wrangler secret put SEARXNG_URL
    
  6. wrangler deploy and complete setup in the browser relay form at your Worker domain.

Storage maps to Cloudflare via MCP_STORAGE_BACKEND=cf-kv (credentials/tokens, encrypted), DOCS_DB_BACKEND=cf-d1 (docs + BM25 full-text), and Vectorize (embeddings). Web search uses a SearXNG instance (SEARCH_BACKEND=searxng, SEARXNG_URL) or Tavily (SEARCH_BACKEND=tavily); embed/rerank are forced cloud via EMBEDDING_MODELS/RERANK_MODELS.

Trust Model

This plugin implements TC-Local (machine-bound, single trust principal). See mcp-core trust model for full classification.

Mode Storage Encryption Who can read your data?
stdio (default) ~/.wet-mcp/config.json AES-GCM, machine-bound key Only your OS user (file perm 0600)
HTTP self-host Same as stdio Same Only you (admin = user)

License

MIT -- See LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured