Scraper Maintenance MCP

Scraper Maintenance MCP

MCP server for automating web scraper maintenance via browser inspection, selector generation, and code updates.

Category
Visit Server

README

Scraper Maintenance MCP

A comprehensive Model Context Protocol (MCP) server for automating web scraper maintenance through intelligent browser inspection, selector generation, and code updates.

šŸ“ Project Structure

mcp/
ā”œā”€ā”€ src/                    # TypeScript source files
│   ā”œā”€ā”€ server.ts          # Main MCP server implementation
│   ā”œā”€ā”€ browser-manager.ts # Browser automation and management
│   ā”œā”€ā”€ selector-generator.ts # Selector generation and scoring
│   └── types.ts           # Type definitions
ā”œā”€ā”€ dist/                   # Compiled JavaScript files
│   ā”œā”€ā”€ server.js          # Main MCP server (executable)
│   ā”œā”€ā”€ browser-manager.js # Browser automation
│   ā”œā”€ā”€ selector-generator.js # Selector intelligence
│   └── types.js           # Type definitions
ā”œā”€ā”€ config/                 # Configuration files
│   ā”œā”€ā”€ test-config.json   # Test configuration
│   ā”œā”€ā”€ claude-desktop-config.json # Claude Desktop setup
│   └── *.json             # Various scraper configurations
ā”œā”€ā”€ examples/               # Usage examples and documentation
ā”œā”€ā”€ docs/                   # Documentation files
ā”œā”€ā”€ scripts/                # Build and utility scripts
ā”œā”€ā”€ package.json           # Project configuration
└── tsconfig.json          # TypeScript configuration

šŸš€ Quick Start

1. Install Dependencies

cd mcp
npm install

2. Build the Project

npm run build

3. Run the Server

npm start

šŸ› ļø Available MCP Tools

Configuration Management

  • load_scraper_config - Load scraper configuration files
  • update_config - Update configurations with new selector mappings

Browser Operations

  • initialize_browser - Launch browser (headless/visible mode)
  • navigate_to_page - Navigate to target URLs
  • take_screenshot - Capture debugging screenshots
  • close_browser - Cleanup browser resources

Element Inspection

  • inspect_field_manually - Interactive visual element selection
  • auto_detect_field - AI-powered automatic element detection
  • validate_selectors - Test selector reliability and performance
  • generate_selectors - Create multiple selector variations with scoring
  • test_extraction - Test data extraction using current selectors

Maintenance & Code Generation

  • run_maintenance_check - Comprehensive scraper health analysis
  • generate_extractor_code - Multi-language code generation

šŸ“– Usage

For Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "scraper-maintenance": {
      "command": "node",
      "args": ["/path/to/mcp/dist/server.js"],
      "env": {
        "NODE_ENV": "production"
      }
    }
  }
}

For Cursor

Add to your Cursor MCP configuration:

{
  "mcpServers": {
    "scraper-maintenance": {
      "command": "node",
      "args": ["/path/to/mcp/dist/server.js"],
      "env": {
        "NODE_ENV": "production"
      }
    }
  }
}

šŸ”§ Development

Build

npm run build

Development Mode

npm run dev

Test

npm test

šŸ“š Documentation

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured