MediaCrawler MCP Server
MCP server for crawling social media platforms (e.g., Bilibili) by keywords, video IDs, or creator IDs, with support for MySQL, JSON, and CSV storage.
README
🔥 MediaCrawler_MCP_Server - MCP for MediaCrawler 🕷️
🔗 MediaCrawler仓库地址
https://github.com/NanmiCoder/MediaCrawler
🔧 基于MediaCrawler改进
- python版本:所有包版本可用于python 3.13,以支持mcp的使用
- mysql存储:如表已经存在,初始化不会覆盖原有数据
✨ 更多设置(config)可见MediaCrawler仓库
🧰 可使用的MCP工具
- crawl_search(platform: str, store_type: str, keywords: str) - Start the crawler for the media platform by keywords
- crawl_detail(platform: str, store_type: str, video_id: list): - Start the crawler for the media platform by video ID
- crawl_creator(platform: str, store_type: str, creator_id: list) - Start the crawler for the media platform by creator id.
📦 Python包安装
# 进入项目目录
cd MediaCrawler_MCP_Server
# 使用 uv sync 命令来保证 python 版本和相关依赖包的一致性
uv sync
🌐 浏览器驱动安装
# 安装浏览器驱动
uv run playwright install
⚙️ 设置
设置环境变量:
MYSQL_DB_HOST=localhost # Database host
MYSQL_DB_PORT=3306 # Optional: Database port (defaults to 3306 if not specified)
MYSQL_DB_USER=your_username
MYSQL_DB_PWD=your_password
MYSQL_DB_NAME=your_database
CRAWLER_MAX_NOTES_COUNT=20 # number of notes you want to crawl
MAX_CONCURRENCY_NUM=1 # number of concurrent crawlers
ENABLE_GET_COMMENTS=true # crawl the comments or not
🚀 使用
添加至 claude_desktop_config.json or cline_mcp_settings.json
"mediacrawler": {
"disabled": false,
"timeout": 600,
"type": "stdio",
"command": "uv",
"args": [
"--directory",
"path/to/MediaCrawler_MCP_Server",
"run",
"main.py"
],
"env": {
"MYSQL_DB_HOST": "localhost",
"MYSQL_DB_PORT": "3306",
"MYSQL_DB_USER": "your_username",
"MYSQL_DB_NAME": "your_database",
"MYSQL_DB_PWD": "your_password",
"CRAWLER_MAX_NOTES_COUNT": "20",
"MAX_CONCURRENCY_NUM": "1",
"ENABLE_GET_COMMENTS": "true"
}
}
🌰 例子
帮我爬取b站视频资料,关键词为"钱",存储模式为mysql。帮我爬取b站视频号为BV1d54y1g7db,BV1Sz4y1U77N的视频存储模式为json。帮我爬取b站up主视频资料,其id为20813884,存储模式为csv。
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.