Deepghs MCP
MCP server for the DeepGHS anime AI ecosystem. Dataset discovery, tag search, character dataset finder, and training pipeline code generation.
README
deepghs-mcp
<a href="https://glama.ai/mcp/servers/citronlegacy/deepghs-mcp"> <img width="380" height="200" src="https://glama.ai/mcp/servers/citronlegacy/deepghs-mcp/badge" alt="DeepGHS MCP server rating" /> </a>
A Python MCP server for the DeepGHS anime AI ecosystem. Connect it to any MCP-compatible client (Claude Desktop, Cursor, etc.) to browse datasets, discover pre-built character training sets, look up tags across 18 platforms, and generate complete data pipeline scripts โ all directly from your AI assistant.
โจ Features
๐ฆ Dataset & Model Discovery
- Browse all DeepGHS datasets โ Danbooru2024 (8M+ images), Sankaku, Gelbooru, Zerochan, BangumiBase, and more
- Full file trees โ see exactly which tar/parquet files a dataset contains and how large they are before downloading anything
- Model catalog โ find the right model for your task: CCIP, WD Tagger Enhanced, aesthetic scorer, face/head detector, anime classifier, and more
- Live demos โ browse DeepGHS Spaces (interactive web apps) for testing models without code
๐ท๏ธ Cross-Platform Tag Intelligence
- site_tags lookup โ 2.5M+ tags unified across 18 platforms in one query
- Tag format translation โ Danbooru uses
hatsune_miku, Zerochan usesHatsune Miku, Pixiv usesๅ้ณใใฏโ this tool maps them all together - Ready-to-use Parquet queries โ get copy-paste code to filter the tag database programmatically
๐ฏ Character Dataset Finder
- Pre-built LoRA datasets โ search both
deepghsandCyberHaremnamespaces for existing character image collections - Ready-to-run download commands โ get the exact
cheesechasercommand to pull what you need - Smart fallback โ if no pre-built dataset exists, the tool hands off directly to the waifuc script generator
๐ค Training Pipeline Code Generation
- waifuc scripts โ generate complete, annotated Python data collection pipelines for any character from any source (Danbooru, Pixiv, Gelbooru, Zerochan, Sankaku, or Auto)
- cheesechaser scripts โ generate targeted download scripts to pull specific post IDs from indexed multi-TB datasets without downloading the whole archive
- Format-aware โ crop sizes, bucket ranges, and export formats automatically adjusted for SD 1.5, SDXL, or Flux
๐ฆ Installation
Prerequisites
- Python 3.10+
git
Quick Start
- Clone the repository:
git clone https://github.com/citronlegacy/deepghs-mcp.git
cd deepghs-mcp
- Run the installer:
chmod +x install.sh && ./install.sh
# or without chmod:
bash install.sh
- Or install manually:
pip install -r requirements.txt
๐ Authentication
HF_TOKEN is optional for public datasets but strongly recommended โ it raises HuggingFace's API rate limit and is required for any gated or private repositories.
Get your token at huggingface.co/settings/tokens (read access is sufficient).
Without it, the server still works for all public DeepGHS datasets.
โถ๏ธ Running the Server
python deepghs_mcp.py
# or via the venv created by install.sh:
.venv/bin/python deepghs_mcp.py
โ๏ธ Configuration
Claude Desktop
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"deepghs": {
"command": "/absolute/path/to/.venv/bin/python",
"args": ["/absolute/path/to/deepghs_mcp.py"],
"env": {
"HF_TOKEN": "hf_your_token_here"
}
}
}
}
Other MCP Clients
- Command:
/absolute/path/to/.venv/bin/python - Args:
/absolute/path/to/deepghs_mcp.py - Transport: stdio
๐ก Usage Examples
Browse available datasets
"What anime datasets does DeepGHS have on HuggingFace?"
The assistant calls deepghs_list_datasets and returns all datasets sorted by download count โ Danbooru2024, Sankaku, Gelbooru WebP, BangumiBase, site_tags, and more โ with links and update dates.
Check dataset contents before downloading
"What files are in deepghs/danbooru2024? How big is it?"
The assistant calls deepghs_get_repo_info and returns the full file tree โ every .tar and .parquet file with individual and total sizes โ so you know exactly what you're committing to before you download.
Find a pre-built character dataset
"Is there already a dataset for Rem from Re:Zero I can use for LoRA training?"
The assistant calls deepghs_find_character_dataset, searches both deepghs and CyberHarem namespaces, and returns any matches with download counts and a one-liner download command.
Example response:
## Character Dataset Search: Rem
Found 2 dataset(s):
### CyberHarem/rem_rezero
Downloads: 4.2K | Likes: 31 | Updated: 2024-08-12
Download with cheesechaser:
from cheesechaser.datapool import SimpleDataPool
pool = SimpleDataPool('CyberHarem/rem_rezero')
pool.batch_download_to_directory('./dataset', max_workers=4)
Look up a tag across all platforms
"What is the correct tag for 'Hatsune Miku' on Danbooru, Zerochan, and Pixiv?"
The assistant calls deepghs_search_tags and returns the format for every platform, plus a pre-filtered link to the site_tags dataset viewer and Parquet query code.
Example response:
Tag Lookup: Hatsune Miku
Danbooru: hatsune_miku
Gelbooru: hatsune_miku
Sankaku: character:hatsune_miku
Zerochan: Hatsune Miku
Pixiv: ๅ้ณใใฏ (Japanese preferred)
Yande.re: hatsune_miku
Wallhaven: hatsune miku
This directly solves the MultiBoru tag normalization problem.
Generate a full data collection pipeline
"Generate a waifuc script to build a LoRA dataset for Surtr from Arknights, from Danbooru and Pixiv, for SDXL, safe-only."
The assistant calls deepghs_generate_waifuc_script and returns a complete annotated Python script. Here is what that script does when run:
1. Crawls Danbooru (surtr_(arknights)) + Pixiv (ในใซใ ใขใผใฏใใคใ)
2. Converts all images to RGB with white background
3. Drops monochrome, sketch, manga panels, and 3D renders
4. Filters to safe rating only
5. Deduplicates near-identical images (ๅทฎๅ / variants)
6. Drops images with no detected face
7. Splits group images โ each character becomes its own crop
8. CCIP identity filter โ AI verifies every image actually shows Surtr
9. WD14 auto-tagging โ writes .txt caption files automatically
10. Resizes and pads to 1024ร1024 (SDXL standard)
11. Exports in kohya_ss-compatible folder structure
No manual curation. No manual tagging. Drop the output folder straight into your trainer.
Generate a targeted download script
"Give me a cheesechaser script to download post IDs 1234, 5678, 9012 from deepghs/danbooru2024."
The assistant calls deepghs_generate_cheesechaser_script and returns a complete Python script โ with options for downloading by post ID list, downloading everything, or filtering from the Parquet index by tag.
Find the right model for a task
"Does DeepGHS have a model for scoring image aesthetics?"
The assistant calls deepghs_list_models with search: "aesthetic" and returns matching models with pipeline task, download counts, and direct HuggingFace links.
Try a model in the browser
"Is there a live demo for the DeepGHS face detection model?"
The assistant calls deepghs_list_spaces with search: "detection" and returns matching Spaces with direct links to the interactive demos.
๐ ๏ธ Available Tools
| Tool | Description | Key Parameters |
|---|---|---|
deepghs_list_datasets |
Browse all DeepGHS datasets with search, sort, and pagination | search, sort, limit, offset |
deepghs_list_models |
Browse all DeepGHS models | search, sort, limit |
deepghs_list_spaces |
Browse all DeepGHS live demo Spaces | search, limit |
deepghs_get_repo_info |
Full file tree + metadata for any dataset/model/space | repo_id, repo_type |
deepghs_search_tags |
Cross-platform tag lookup across 18 platforms via site_tags | tag |
deepghs_find_character_dataset |
Find pre-built LoRA training datasets for a character | character_name |
deepghs_generate_waifuc_script |
Generate complete data collection + cleaning pipeline script | character_name, sources, model_format, content_rating |
deepghs_generate_cheesechaser_script |
Generate targeted dataset download script | repo_id, post_ids, output_dir |
๐ Tools Reference
deepghs_list_datasets
Lists all public datasets from DeepGHS, sortable and filterable by keyword.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search |
string | โ | โ | Keyword filter, e.g. danbooru, character, face |
sort |
string | โ | downloads |
downloads, likes, createdAt, lastModified |
limit |
integer | โ | 20 |
Results per page (max 100) |
offset |
integer | โ | 0 |
Pagination offset |
response_format |
string | โ | markdown |
markdown or json |
deepghs_list_models
Lists all public models โ CCIP, WD Tagger Enhanced, aesthetic scorer, face/head/person detectors, anime classifier, furry detector, NSFW censor, style era classifier, and more.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search |
string | โ | โ | Keyword filter, e.g. ccip, tagger, aesthetic, face |
sort |
string | โ | downloads |
downloads, likes, createdAt, lastModified |
limit |
integer | โ | 20 |
Results per page (max 100) |
offset |
integer | โ | 0 |
Pagination offset |
response_format |
string | โ | markdown |
markdown or json |
deepghs_list_spaces
Lists all public Spaces โ live demos for face detection, head detection, CCIP character similarity, WD tagger, aesthetic scorer, reverse image search, Danbooru character lookup, and more.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search |
string | โ | โ | Keyword filter, e.g. detection, tagger, search |
limit |
integer | โ | 20 |
Results per page (max 100) |
response_format |
string | โ | markdown |
markdown or json |
deepghs_get_repo_info
Get full metadata for any dataset, model, or space โ including the complete file tree with individual file sizes. Essential before downloading a multi-TB dataset.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_id |
string | โ | โ | Full HF repo ID, e.g. deepghs/danbooru2024 |
repo_type |
string | โ | dataset |
dataset, model, or space |
response_format |
string | โ | markdown |
markdown or json |
Tip: Datasets containing
.tarfiles automatically get a ready-to-copy cheesechaser snippet appended.
deepghs_search_tags
Look up any tag across 18 platforms using the deepghs/site_tags dataset. Returns per-platform format guidance, a pre-filtered dataset viewer link, and Parquet query code.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
tag |
string | โ | โ | Tag in any format or language, e.g. hatsune_miku, Hatsune Miku, ๅ้ณใใฏ |
response_format |
string | โ | markdown |
markdown or json |
LLM Tip: Call this before
deepghs_generate_waifuc_scriptto confirm the correct Danbooru/Pixiv tag format for a character.
deepghs_find_character_dataset
Searches deepghs and CyberHarem on HuggingFace for pre-built character datasets. CyberHarem datasets are built with the full automated pipeline: crawl โ CCIP filter โ WD14 tag โ upload.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
character_name |
string | โ | โ | Character name, e.g. Rem, Hatsune Miku, surtr arknights |
response_format |
string | โ | markdown |
markdown or json |
deepghs_generate_waifuc_script
Generates a complete, annotated Python pipeline script using waifuc.
Pipeline actions included (in order):
| Action | What it does | Why it matters for LoRA |
|---|---|---|
ModeConvertAction |
Convert to RGB, white background | Standardizes input format |
NoMonochromeAction |
Drop greyscale/sketch images | Prevents style contamination |
ClassFilterAction |
Keep illustration/anime only | Drops manga panels and 3D |
RatingFilterAction |
Filter by content rating | Keep dataset SFW if needed |
FilterSimilarAction |
Deduplicate similar images | Prevents overfitting to variants |
FaceCountAction |
Require exactly 1 face | Removes group shots and objects |
PersonSplitAction |
Crop each character from group images | Maximizes usable data |
CCIPAction |
AI identity verification | Removes wrong characters (the most important step) |
TaggingAction |
WD14 auto-tagging | Generates .txt captions automatically |
AlignMinSizeAction |
Resize to minimum resolution | Ensures quality floor |
PaddingAlignAction |
Pad to square | Standard training resolution |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
character_name |
string | โ | โ | Display name, e.g. Rem, Surtr |
danbooru_tag |
string | โ | auto-guessed | Danbooru tag, e.g. rem_(re:zero) |
pixiv_query |
string | โ | โ | Pixiv search query, Japanese preferred. Required if pixiv in sources |
sources |
list | โ | ["danbooru"] |
danbooru, pixiv, gelbooru, zerochan, sankaku, auto |
model_format |
string | โ | sd1.5 |
sd1.5 (512px), sdxl (1024px), or flux (1024px) |
content_rating |
string | โ | safe |
safe, safe_r15, or all |
output_dir |
string | โ | ./dataset_output |
Output directory |
max_images |
integer | โ | no limit | Cap total images collected |
pixiv_token |
string | โ | โ | Pixiv refresh token (required for Pixiv source) |
Source tag formats:
| Source | Tag Format | Notes |
|---|---|---|
danbooru |
rem_(re:zero) |
snake_case with series in parens |
gelbooru |
rem_(re:zero) |
same as Danbooru |
pixiv |
ใฌใ / rem re:zero |
Japanese preferred for better results |
zerochan |
Rem |
Title Case, strict mode enabled |
sankaku |
rem_(re:zero) |
snake_case |
auto |
character name | uses gchar database โ best for game characters |
deepghs_generate_cheesechaser_script
Generates a Python download script using cheesechaser to pull specific images from indexed tar datasets without downloading entire archives.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_id |
string | โ | โ | HF dataset, e.g. deepghs/danbooru2024 |
output_dir |
string | โ | ./downloads |
Local download directory |
post_ids |
list[int] | โ | โ | Specific post IDs. If omitted, downloads all (can be very large) |
max_workers |
integer | โ | 4 |
Parallel download threads (1โ16) |
๐๏ธ Key DeepGHS Datasets
| Repo ID | Description | Use Case |
|---|---|---|
deepghs/danbooru2024 |
Full Danbooru archive, 8M+ images | Bulk downloads, data mining |
deepghs/danbooru2024-webp-4Mpixel |
Compressed WebP version | Faster downloads |
deepghs/sankaku_full |
Full Sankaku Channel dataset | Alternative tag ecosystem |
deepghs/gelbooru-webp-4Mpixel |
Gelbooru compressed | Western fanart coverage |
deepghs/site_tags |
2.5M+ tags, 18 platforms | Tag normalization |
deepghs/anime_face_detection |
YOLO face detection labels | Train detection models |
deepghs/bangumibase |
Character frames from anime | Character dataset bootstrapping |
๐ Recommended Workflows
Workflow A: Use a pre-built dataset
1. deepghs_find_character_dataset โ check if dataset exists
2. deepghs_get_repo_info โ inspect file sizes
3. deepghs_generate_cheesechaser_script โ get download command
4. Run the script โ train
Workflow B: Build a new dataset from scratch
1. deepghs_search_tags โ find the correct Danbooru tag
2. deepghs_generate_waifuc_script โ generate full pipeline script
3. Run the script โ crawl, filter, tag, crop
4. Review output โ remove any remaining noise
5. Train with kohya_ss
Workflow C: Mine specific images from a large dataset
1. deepghs_get_repo_info โ inspect dataset Parquet structure
2. deepghs_search_tags โ confirm tag names
3. deepghs_generate_cheesechaser_script โ generate Parquet-filter script
4. Run the script โ downloads only matching images
๐ค Notes for LLMs
- Check pre-built first: Always call
deepghs_find_character_datasetbefore generating a waifuc script. CyberHarem has hundreds of ready-to-use LoRA datasets. - Danbooru tag format:
character_(series)with underscores โrem_(re:zero), notRem (Re:Zero). Usedeepghs_search_tagsto confirm. - File sizes: Datasets like
danbooru2024are multi-TB. Always checkdeepghs_get_repo_infobefore recommending a full download. Use cheesechaser with post IDs for targeted access. - CCIP is essential: It's the most important pipeline step โ without it, 20โ40% of a character dataset will be wrong-character noise. Always include it in waifuc scripts.
- Pixiv source: Requires a
pixiv_token. If the user hasn't set one up, suggest Danbooru + Gelbooru instead. - Model format crop sizes: SD1.5 = 512ร512, SDXL/Flux = 1024ร1024. This controls
AlignMinSizeActionandPaddingAlignActionin generated scripts.
โ ๏ธ Known Limitations
- HuggingFace rate limits: Without
HF_TOKEN, the Hub API may throttle requests on heavy usage. - Gated datasets: Some datasets require explicit approval on HuggingFace before downloading. The server returns a clear error with guidance.
- CyberHarem search: Niche characters may need manual browsing at huggingface.co/CyberHarem.
- waifuc runtime: The generated scripts require
waifucinstalled separately (not in this server's deps). First run downloads ~500MB of CCIP + WD14 model weights.
๐ Troubleshooting
Server won't start:
- Ensure Python 3.10+:
python --version - Re-run the installer:
bash install.sh
Rate limit / 429 errors:
- Set
HF_TOKENin your MCP env config - Get a free token: huggingface.co/settings/tokens
403 / Forbidden on a dataset:
- The dataset is gated โ visit the dataset page on HuggingFace and click "Request Access"
- Ensure the
HF_TOKENis from an account that has been granted access
Character dataset not found:
- Try alternate spellings:
"Rem","rem_(re:zero)","rem re:zero" - Browse manually: huggingface.co/CyberHarem
- Generate from scratch with
deepghs_generate_waifuc_script
waifuc script fails:
- Install waifuc:
pip install git+https://github.com/deepghs/waifuc.git - GPU support:
pip install "waifuc[gpu]"(much faster CCIP + tagging) - First run downloads ~500MB of model weights โ this is expected
๐ค Contributing
Pull requests are welcome! If a tool is returning incorrect data, a script template is outdated, or a new DeepGHS dataset or model should be highlighted, please open an issue or PR.
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
๐ License
MIT License โ see LICENSE for details.
๐ Links
- ๐ค DeepGHS on HuggingFace
- ๐งฐ waifuc โ data pipeline
- ๐ง cheesechaser โ targeted HF downloader
- ๐ผ๏ธ imgutils โ image processing
- ๐ค cyberharem โ automated LoRA pipeline
- ๐ง MCP documentation
- ๐ Bug Reports
- ๐ก Feature Requests
Related MCP Servers
- gelbooru-mcp โ Search Gelbooru, generate SD prompts from character tag data
- zerochan-mcp โ Browse Zerochan's high-quality anime image board
<!-- mcp-name: io.github.citronlegacy/deepghs-mcp -->
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.