MCP Servers

MCP Web Research Agent

Enables automated web research and intelligence gathering through recursive web crawling, multi-engine search integration, and persistent SQLite storage with support for keyword filtering and multiple export formats.

README

MCP Web Research Agent

A powerful MCP (Model Context Protocol) tool for automated web research, scraping, and intelligence gathering.

A sophisticated web research automation tool that converts your existing scraper into an MCP-compatible agent for enhanced AI workflows. Perfect for competitive intelligence, market research, and automated data collection.

🚀 Features

🔍 Intelligent Scraping: Recursive web crawling with configurable depth
🔎 Search Integration: Multi-engine search with result processing
💾 Database Storage: Persistent SQLite storage with advanced querying
📊 Multiple Export Formats: JSON, Markdown, and CSV exports
🤖 MCP Integration: Seamless integration with AI assistants
⚡ Async Ready: Built for concurrent operations
🔧 Configurable: Adjustable settings for any use case

🛠️ Installation

Prerequisites

Python 3.8+
MCP-compatible client (Claude Desktop, etc.)

Quick Install

# Clone the repository
git clone https://github.com/yourusername/mcp-web-research-agent.git
cd mcp-web-research-agent

# Install dependencies
pip install -e .

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "web-research-agent": {
      "command": "python",
      "args": ["/path/to/mcp-web-research-agent/server.py"]
    }
  }
}

📖 Usage

Available Tools

`scrape_url`

Scrape a single URL for specific keywords

result = await scrape_url(
    url="https://example.com",
    keywords=["python", "automation", "scraping"],
    extract_links=False,
    max_depth=1
)

`search_and_scrape`

Search the web and automatically scrape results

result = await search_and_scrape(
    query="web scraping best practices",
    keywords=["python", "beautifulsoup", "requests"],
    search_engine_url="https://searx.gophernuttz.us/search/",
    max_results=10
)

`get_scraping_results`

Query the database for previous scraping results

result = await get_scraping_results(
    keyword_filter="python",
    limit=50
)

`export_results`

Export results to various formats

result = await export_results(
    format="markdown",
    keyword_filter="python",
    output_path="/path/to/output.md"
)

`get_scraping_stats`

Get current statistics and status

result = await get_scraping_stats()

🗃️ Database Schema

The agent uses SQLite with the following structure:

-- URLs table
CREATE TABLE urls (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    url TEXT UNIQUE NOT NULL,
    title TEXT,
    content TEXT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Keywords table  
CREATE TABLE keywords (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    keyword TEXT UNIQUE NOT NULL
);

-- URL-Keyword relationships
CREATE TABLE url_keywords (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    url_id INTEGER,
    keyword_id INTEGER,
    matches INTEGER DEFAULT 1,
    context TEXT,
    FOREIGN KEY (url_id) REFERENCES urls (id),
    FOREIGN KEY (keyword_id) REFERENCES keywords (id),
    UNIQUE(url_id, keyword_id)
);

🔧 Configuration

Default Settings

Max Depth: 3 levels of recursive crawling
Request Delay: 1 second between requests
User Agent: Modern Chrome browser simulation
Database: scraper_results.db (auto-created)

Customization

Modify settings in the MCPWebScraper constructor:

scraper = MCPWebScraper(
    db_manager=db_manager,
    max_depth=5,      # Increase crawl depth
    delay=0.5         # Faster requests
)

🧪 Development

Running Tests

python test_mcp_scraper.py

Example Usage

python example_usage.py

Project Structure

mcp-web-research-agent/
├── server.py              # MCP server implementation
├── scraper.py             # Core scraping logic
├── database.py            # Database management
├── requirements.txt       # Python dependencies
├── pyproject.toml         # Package configuration
├── test_mcp_scraper.py    # Unit tests
├── example_usage.py       # Usage examples
└── README.md              # This file

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built on the Model Context Protocol
Inspired by modern web scraping best practices
Thanks to the open-source community for amazing tools

Built with ❤️ for the MCP ecosystem

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured