MCP Servers

webmcp

MCP server for web search and content extraction using DuckDuckGo or SearXNG, with Playwright-based fetching and LLM-powered data extraction.

README

webmcp

webmcp is an MCP server for web search and content extraction. LLM agents can use it to:

search the web with DuckDuckGo (default) or SearXNG (optional)
fetch and clean page content from one or more URLs
send cleaned content to a local LLM for structured extraction

Features

search_web(query, limit=10) returns web results (title, URL, description)
extract(urls, prompt=None, schema=None, use_browser=True) extracts data from pages
browser-based fetching with Playwright for JavaScript-heavy sites
lightweight HTTP fetching mode for faster/simple pages
persistent tool-call logging to tool_calls.log.json
configurable search provider: DDG by default, optional SearXNG

Critical Requirement

For the main researcher llama.cpp server, include --webui-mcp-proxy in launch parameters. Without this flag, this workflow will not function correctly.

Prompting And Tested Setup

For best results, use research_prompt.txt as your system prompt. This prompt is a core part of the intended workflow and quality; it is effectively half of how this repository is meant to function.

Tested setup:

Main researcher LLM: Qwen3.5:27b-Q3_K_M.gguf via llama.cpp on an RTX 4090, context length 200,000, about 40 tok/s.
Extract tool LLM: Qwen3.5:9b-Q4_K_M.gguf via llama.cpp on a GTX 1080 Ti, context length 32,768, about 40 tok/s.
This workflow has been tested with the llama.cpp WebUI specifically, and has not been validated with other MCP clients yet.

Requirements

Python 3.10+
A local OpenAI-compatible LLM endpoint (for example, llama.cpp, LM Studio, vLLM, ollama, etc)

Configuration

The app reads LLM settings from environment variables and supports a local .env file.

Copy .env.example to .env
Set values:

LLM_URL=http://localhost:1234
LLM_MODEL=your-model-name
SEARCH_PROVIDER=ddg
# Optional when SEARCH_PROVIDER=searxng
SEARXNG_URL=http://localhost:8080

LLM_URL and LLM_MODEL are required at startup. SEARCH_PROVIDER defaults to ddg. Set it to searxng to replace DDG, and provide SEARXNG_URL.

Search Providers

search_web supports two providers:

ddg (default): uses DuckDuckGo via ddgs
searxng: uses your SearXNG instance

SearXNG notes:

Set SEARCH_PROVIDER=searxng
Set SEARXNG_URL to your instance base URL (for example, http://192.168.0.55:8888)
webmcp calls <SEARXNG_URL>/search with format=json

Install

Install dependencies from the pinned requirements file:

pip install -r requirements.txt
python -m playwright install chromium

Run

python app.py

Server starts on:

http://0.0.0.0:8642

MCP Usage Notes

extract(..., use_browser=True) is best for dynamic pages that require JS rendering.
extract(..., use_browser=False) is faster for static pages.
If extraction quality is poor, the LLM should provide a more specific prompt and/or a stricter schema.

TODO

Revisit JS page rendering and extraction strategy. Right now, roughly 25-30% of pages return little or no usable content even when fetched successfully.
Improve anti-bot handling for page fetches. Many targets still return 400-range errors, so investigate stronger browser mimicry (Playwright/Chromium behavior, headers, fingerprinting, and potentially user-agent/profile rotation).

License

MIT. See LICENSE.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured