TaobaoScraper MCP Server

TaobaoScraper MCP Server

An MCP server for scraping product data from Taobao/Tmall and JD.com, providing 8 tools for scraping, task management, notifications, and system control.

Category
Visit Server

README

TaobaoScraper MCP Server

An MCP (Model Context Protocol) server for scraping product data from Taobao/Tmall and JD.com (Jingdong). Provides 8 tools that can be used directly in Claude Desktop or Claude Code.

Platform: Windows (requires Chrome browser) Language: Python 3.10+

Features

  • Multi-platform scraping - Taobao, Tmall, JD.com in one tool
  • 8 MCP tools - Scraping, task management, notifications, system control
  • Hot search words - Discover trending keywords and market insights
  • Excel export - Auto-saves results as .xlsx files
  • Notification push - WeChat (ServerChan / PushPlus) and Email alerts
  • Dual transport - stdio (local) and SSE (remote) protocols
  • Trilingual GUI - Chinese / English / Korean interface (optional)

Architecture

Claude Desktop / Claude Code
        |
   MCP Server (stdio or SSE)
        |
   FastAPI Backend (:8000)
        |
   Selenium + Chrome (:9222)

Quick Start

1. Prerequisites

  • Python 3.10+
  • Google Chrome browser
  • Windows OS

2. Install

git clone https://github.com/jhongjun1981/taobao-scraper-mcp.git
cd taobao-scraper-mcp

# Install MCP server dependencies
pip install -r requirements_mcp.txt

# Install API backend dependencies
pip install -r requirements_api.txt

3. Configure

cp .env.example .env
# Edit .env to set your SCRAPER_API_KEY

4. Start Chrome with debug port

# Close all Chrome instances first, then:
START_GUI.bat
# Or manually:
chrome.exe --remote-debugging-port=9222 --user-data-dir="%LOCALAPPDATA%\Google\Chrome\Debug Profile"

5. Login to Taobao (first time only)

Open Chrome and login to your Taobao account. Cookies will persist in the debug profile.

6. Start API Backend

python run_api.py

7. Configure MCP Server

For Claude Code - Add to .claude/settings.json:

{
  "mcpServers": {
    "taobao-scraper": {
      "command": "python",
      "args": ["run_mcp.py"],
      "cwd": "/path/to/taobao-scraper-mcp"
    }
  }
}

For Claude Desktop - Add to claude_desktop_config.json:

{
  "mcpServers": {
    "taobao-scraper": {
      "command": "python",
      "args": ["run_mcp.py"],
      "cwd": "C:\\path\\to\\taobao-scraper-mcp"
    }
  }
}

SSE mode (remote):

python run_mcp.py --transport sse --port 8001
{
  "mcpServers": {
    "taobao-scraper": {
      "type": "sse",
      "url": "http://your-server:8001/sse"
    }
  }
}

Tools Reference

Scraping Tools

scrape_products

Scrape product data from e-commerce platforms.

Parameter Type Default Description
keyword string required Search keyword
platform string "taobao" "taobao" / "jd" / "multi"
pages int 3 Number of pages to scrape
sort_by string "sale" Sort method
tmall_only bool false Tmall products only
exact_match bool false Exact keyword match
price_min float 0 Min price filter
price_max float 0 Max price filter
wait_for_result bool true Wait for completion
timeout int 300 Timeout in seconds

Example: "Scrape running shoes from both Taobao and JD"

scrape_hotwords

Scrape trending/related search keywords for market analysis.

Parameter Type Default Description
keyword string required Base keyword
wait_for_result bool true Wait for completion
timeout int 120 Timeout in seconds

Task Management Tools

list_tasks

List recent scraping tasks with status (pending/running/completed/failed).

Parameter Type Default Description
limit int 20 Max tasks to return

get_task

Get detailed status of a specific task. Set include_result=true to get full results.

Parameter Type Default Description
task_id string required Task ID
include_result bool false Include result data

cancel_task

Cancel a running or pending task.

Parameter Type Default Description
task_id string required Task ID to cancel

Export Tools

list_files

List all exported Excel data files with filename, size, and modification time.

Notification Tools

send_notification

Send push notifications via WeChat or Email.

Parameter Type Default Description
channel string required "wechat" / "email" / "test"
title string "" Notification title
content string "" Notification body
wx_type string "server_chan" "server_chan" or "pushplus"

System Tools

system_status

Check system health or restart Chrome.

Parameter Type Default Description
action string "check" "check" or "restart_chrome"

Environment Variables

Variable Default Description
SCRAPER_API_KEY changeme-your-secret-key API authentication key
SCRAPER_API_URL http://localhost:8000 FastAPI backend URL
SCRAPER_CHROME_PORT 9222 Chrome debug port
MCP_SSE_HOST 0.0.0.0 SSE listen address
MCP_SSE_PORT 8001 SSE listen port
MCP_HTTP_TIMEOUT 30 HTTP request timeout

Data Output

Scraped data is automatically saved as Excel files containing:

  • Product title, image URL, price
  • Monthly sales volume, review count
  • Shop name, shop type, shop region
  • Product URL, scrape timestamp

Important Notes

  • Taobao requires login - You must login to your own Taobao account in Chrome first
  • JD works without login - JD scraping does not require authentication
  • One task at a time - Chrome can only handle one scraping task concurrently
  • Windows only - The Chrome automation layer uses Windows-specific APIs
  • Respect rate limits - Excessive scraping may trigger anti-bot protection

License

MIT License - see LICENSE file.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured