MCP Servers

Spider MCP

Enables web searching and webpage scraping using pure crawler technology without requiring official APIs. Supports Bing web and news search, batch webpage scraping, and content extraction through Puppeteer automation.

README

Spider MCP - Web Search Crawler Service

A web search MCP service based on pure crawler technology, built with Node.js.

Features

❌ No Official API Required: Completely based on crawler technology, no dependency on third-party official APIs
🔍 Intelligent Search: Supports Bing web and news search
📰 News Search: Built-in news search with time filtering
🕷️ Pure Crawler: No official API dependency, uses Puppeteer for web scraping
🚀 High Performance: Supports batch web scraping
📊 Health Monitoring: Complete health check and metrics monitoring
📝 Structured Logging: Uses Winston for structured logs
🔒 Anti-Detection: Supports User-Agent rotation and other anti-bot measures
🔗 Smart URL Cleaning: Automatically cleans promotional parameters while preserving essential information

Tech Stack

Node.js (>= 18.0.0)
Express.js - Web framework
Puppeteer - Browser automation
Cheerio - HTML parsing
Axios - HTTP client
Winston - Logging
@modelcontextprotocol/sdk - MCP protocol support

Quick Start

1. Install dependencies

npm install

or use pnpm

pnpm install

2. Download Puppeteer browser

npx puppeteer browsers install chrome

3. Environment configuration

Copy and configure the environment variables file:

cp .env.example .env

Edit the .env file according to your needs.

4. Start the service

Development mode:

npm run dev

Production mode:

npm start

The service will start at http://localhost:3000.

MCP Tools

web_search

Unified search tool supporting both web and news search:

Web Search: searchType: "web"
News Search: searchType: "news" with time filtering
Note: searchType is a required parameter and must be explicitly specified

Usage Examples:

# Web search
Use web_search tool to search "Node.js tutorial" with searchType set to web, return 10 results

# News search
Use web_search tool to search "tech news" with searchType set to news, return 5 results from past 24 hours

Other Tools

get_webpage_content: Get webpage content and convert to specified format
get_webpage_source: Get raw HTML source code of webpage
batch_webpage_scrape: Batch scrape multiple webpages

MCP Configuration

Chatbox Configuration

Create mcp-config.json file in Chatbox:

{
  "mcpServers": {
    "spider-mcp": {
      "command": "node",
      "args": ["src/mcp/server.js"],
      "env": {
        "NODE_ENV": "production"
      },
      "description": "Spider MCP - Web search and webpage scraping tools",
      "capabilities": {
        "tools": {}
      }
    }
  }
}

Other MCP Clients

{
  "mcpServers": {
    "spider-mcp": {
      "command": "node",
      "args": ["path/to/spider-mcp/src/mcp/server.js"]
    }
  }
}

Important Notes

Anti-bot Measures: This service uses various techniques to avoid detection, but still needs to comply with robots.txt and terms of use
Rate Limiting: It's recommended to control request frequency reasonably to avoid putting pressure on target websites
Legal Compliance: Please ensure compliance with local laws and website terms of use when using this service
Resource Consumption: Puppeteer will start Chrome browser, please pay attention to memory and CPU usage
URL Cleaning: Automatically cleans promotional parameters but may affect some special link functionality

Development

Project Structure

spider-mcp/
├── src/
│   ├── index.js          # Main entry file
│   ├── mcp/
│   │   └── server.js     # MCP server
│   ├── routes/           # Route definitions
│   │   ├── search.js     # Search routes
│   │   └── health.js     # Health check routes
│   ├── services/         # Business logic
│   │   └── searchService.js # Search service
│   └── utils/            # Utility functions
│       └── logger.js     # Logging utility
├── logs/                 # Log files directory
├── tests/                # Test files
├── package.json          # Project configuration
├── .env.example          # Environment variables example
├── mcp-config.json       # MCP configuration example
└── README.md             # Project documentation

License

MIT License

Contributing

Issues and Pull Requests are welcome!

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured