Deepghs MCP

Deepghs MCP

MCP server for the DeepGHS anime AI ecosystem. Dataset discovery, tag search, character dataset finder, and training pipeline code generation.

Category
Visit Server

README

deepghs-mcp

Python License: MIT MCP

<a href="https://glama.ai/mcp/servers/citronlegacy/deepghs-mcp"> <img width="380" height="200" src="https://glama.ai/mcp/servers/citronlegacy/deepghs-mcp/badge" alt="DeepGHS MCP server rating" /> </a>

A Python MCP server for the DeepGHS anime AI ecosystem. Connect it to any MCP-compatible client (Claude Desktop, Cursor, etc.) to browse datasets, discover pre-built character training sets, look up tags across 18 platforms, and generate complete data pipeline scripts โ€” all directly from your AI assistant.


โœจ Features

๐Ÿ“ฆ Dataset & Model Discovery

  • Browse all DeepGHS datasets โ€” Danbooru2024 (8M+ images), Sankaku, Gelbooru, Zerochan, BangumiBase, and more
  • Full file trees โ€” see exactly which tar/parquet files a dataset contains and how large they are before downloading anything
  • Model catalog โ€” find the right model for your task: CCIP, WD Tagger Enhanced, aesthetic scorer, face/head detector, anime classifier, and more
  • Live demos โ€” browse DeepGHS Spaces (interactive web apps) for testing models without code

๐Ÿท๏ธ Cross-Platform Tag Intelligence

  • site_tags lookup โ€” 2.5M+ tags unified across 18 platforms in one query
  • Tag format translation โ€” Danbooru uses hatsune_miku, Zerochan uses Hatsune Miku, Pixiv uses ๅˆ้ŸณใƒŸใ‚ฏ โ€” this tool maps them all together
  • Ready-to-use Parquet queries โ€” get copy-paste code to filter the tag database programmatically

๐ŸŽฏ Character Dataset Finder

  • Pre-built LoRA datasets โ€” search both deepghs and CyberHarem namespaces for existing character image collections
  • Ready-to-run download commands โ€” get the exact cheesechaser command to pull what you need
  • Smart fallback โ€” if no pre-built dataset exists, the tool hands off directly to the waifuc script generator

๐Ÿค– Training Pipeline Code Generation

  • waifuc scripts โ€” generate complete, annotated Python data collection pipelines for any character from any source (Danbooru, Pixiv, Gelbooru, Zerochan, Sankaku, or Auto)
  • cheesechaser scripts โ€” generate targeted download scripts to pull specific post IDs from indexed multi-TB datasets without downloading the whole archive
  • Format-aware โ€” crop sizes, bucket ranges, and export formats automatically adjusted for SD 1.5, SDXL, or Flux

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.10+
  • git

Quick Start

  1. Clone the repository:
git clone https://github.com/citronlegacy/deepghs-mcp.git
cd deepghs-mcp
  1. Run the installer:
chmod +x install.sh && ./install.sh
# or without chmod:
bash install.sh
  1. Or install manually:
pip install -r requirements.txt

๐Ÿ”‘ Authentication

HF_TOKEN is optional for public datasets but strongly recommended โ€” it raises HuggingFace's API rate limit and is required for any gated or private repositories.

Get your token at huggingface.co/settings/tokens (read access is sufficient).

Without it, the server still works for all public DeepGHS datasets.


โ–ถ๏ธ Running the Server

python deepghs_mcp.py
# or via the venv created by install.sh:
.venv/bin/python deepghs_mcp.py

โš™๏ธ Configuration

Claude Desktop

Add the following to your claude_desktop_config.json:

{
  "mcpServers": {
    "deepghs": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["/absolute/path/to/deepghs_mcp.py"],
      "env": {
        "HF_TOKEN": "hf_your_token_here"
      }
    }
  }
}

Other MCP Clients

  • Command: /absolute/path/to/.venv/bin/python
  • Args: /absolute/path/to/deepghs_mcp.py
  • Transport: stdio

๐Ÿ’ก Usage Examples

Browse available datasets

"What anime datasets does DeepGHS have on HuggingFace?"

The assistant calls deepghs_list_datasets and returns all datasets sorted by download count โ€” Danbooru2024, Sankaku, Gelbooru WebP, BangumiBase, site_tags, and more โ€” with links and update dates.


Check dataset contents before downloading

"What files are in deepghs/danbooru2024? How big is it?"

The assistant calls deepghs_get_repo_info and returns the full file tree โ€” every .tar and .parquet file with individual and total sizes โ€” so you know exactly what you're committing to before you download.


Find a pre-built character dataset

"Is there already a dataset for Rem from Re:Zero I can use for LoRA training?"

The assistant calls deepghs_find_character_dataset, searches both deepghs and CyberHarem namespaces, and returns any matches with download counts and a one-liner download command.

Example response:

## Character Dataset Search: Rem

Found 2 dataset(s):

### CyberHarem/rem_rezero
Downloads: 4.2K | Likes: 31 | Updated: 2024-08-12

Download with cheesechaser:
  from cheesechaser.datapool import SimpleDataPool
  pool = SimpleDataPool('CyberHarem/rem_rezero')
  pool.batch_download_to_directory('./dataset', max_workers=4)

Look up a tag across all platforms

"What is the correct tag for 'Hatsune Miku' on Danbooru, Zerochan, and Pixiv?"

The assistant calls deepghs_search_tags and returns the format for every platform, plus a pre-filtered link to the site_tags dataset viewer and Parquet query code.

Example response:

Tag Lookup: Hatsune Miku

  Danbooru:  hatsune_miku
  Gelbooru:  hatsune_miku
  Sankaku:   character:hatsune_miku
  Zerochan:  Hatsune Miku
  Pixiv:     ๅˆ้ŸณใƒŸใ‚ฏ (Japanese preferred)
  Yande.re:  hatsune_miku
  Wallhaven: hatsune miku

This directly solves the MultiBoru tag normalization problem.


Generate a full data collection pipeline

"Generate a waifuc script to build a LoRA dataset for Surtr from Arknights, from Danbooru and Pixiv, for SDXL, safe-only."

The assistant calls deepghs_generate_waifuc_script and returns a complete annotated Python script. Here is what that script does when run:

1.  Crawls Danbooru (surtr_(arknights)) + Pixiv (ใ‚นใƒซใƒˆ ใ‚ขใƒผใ‚ฏใƒŠใ‚คใƒ„)
2.  Converts all images to RGB with white background
3.  Drops monochrome, sketch, manga panels, and 3D renders
4.  Filters to safe rating only
5.  Deduplicates near-identical images (ๅทฎๅˆ† / variants)
6.  Drops images with no detected face
7.  Splits group images โ€” each character becomes its own crop
8.  CCIP identity filter โ€” AI verifies every image actually shows Surtr
9.  WD14 auto-tagging โ€” writes .txt caption files automatically
10. Resizes and pads to 1024ร—1024 (SDXL standard)
11. Exports in kohya_ss-compatible folder structure

No manual curation. No manual tagging. Drop the output folder straight into your trainer.


Generate a targeted download script

"Give me a cheesechaser script to download post IDs 1234, 5678, 9012 from deepghs/danbooru2024."

The assistant calls deepghs_generate_cheesechaser_script and returns a complete Python script โ€” with options for downloading by post ID list, downloading everything, or filtering from the Parquet index by tag.


Find the right model for a task

"Does DeepGHS have a model for scoring image aesthetics?"

The assistant calls deepghs_list_models with search: "aesthetic" and returns matching models with pipeline task, download counts, and direct HuggingFace links.


Try a model in the browser

"Is there a live demo for the DeepGHS face detection model?"

The assistant calls deepghs_list_spaces with search: "detection" and returns matching Spaces with direct links to the interactive demos.


๐Ÿ› ๏ธ Available Tools

Tool Description Key Parameters
deepghs_list_datasets Browse all DeepGHS datasets with search, sort, and pagination search, sort, limit, offset
deepghs_list_models Browse all DeepGHS models search, sort, limit
deepghs_list_spaces Browse all DeepGHS live demo Spaces search, limit
deepghs_get_repo_info Full file tree + metadata for any dataset/model/space repo_id, repo_type
deepghs_search_tags Cross-platform tag lookup across 18 platforms via site_tags tag
deepghs_find_character_dataset Find pre-built LoRA training datasets for a character character_name
deepghs_generate_waifuc_script Generate complete data collection + cleaning pipeline script character_name, sources, model_format, content_rating
deepghs_generate_cheesechaser_script Generate targeted dataset download script repo_id, post_ids, output_dir

๐Ÿ“– Tools Reference

deepghs_list_datasets

Lists all public datasets from DeepGHS, sortable and filterable by keyword.

Parameters

Parameter Type Required Default Description
search string โŒ โ€” Keyword filter, e.g. danbooru, character, face
sort string โŒ downloads downloads, likes, createdAt, lastModified
limit integer โŒ 20 Results per page (max 100)
offset integer โŒ 0 Pagination offset
response_format string โŒ markdown markdown or json

deepghs_list_models

Lists all public models โ€” CCIP, WD Tagger Enhanced, aesthetic scorer, face/head/person detectors, anime classifier, furry detector, NSFW censor, style era classifier, and more.

Parameters

Parameter Type Required Default Description
search string โŒ โ€” Keyword filter, e.g. ccip, tagger, aesthetic, face
sort string โŒ downloads downloads, likes, createdAt, lastModified
limit integer โŒ 20 Results per page (max 100)
offset integer โŒ 0 Pagination offset
response_format string โŒ markdown markdown or json

deepghs_list_spaces

Lists all public Spaces โ€” live demos for face detection, head detection, CCIP character similarity, WD tagger, aesthetic scorer, reverse image search, Danbooru character lookup, and more.

Parameters

Parameter Type Required Default Description
search string โŒ โ€” Keyword filter, e.g. detection, tagger, search
limit integer โŒ 20 Results per page (max 100)
response_format string โŒ markdown markdown or json

deepghs_get_repo_info

Get full metadata for any dataset, model, or space โ€” including the complete file tree with individual file sizes. Essential before downloading a multi-TB dataset.

Parameters

Parameter Type Required Default Description
repo_id string โœ… โ€” Full HF repo ID, e.g. deepghs/danbooru2024
repo_type string โŒ dataset dataset, model, or space
response_format string โŒ markdown markdown or json

Tip: Datasets containing .tar files automatically get a ready-to-copy cheesechaser snippet appended.


deepghs_search_tags

Look up any tag across 18 platforms using the deepghs/site_tags dataset. Returns per-platform format guidance, a pre-filtered dataset viewer link, and Parquet query code.

Parameters

Parameter Type Required Default Description
tag string โœ… โ€” Tag in any format or language, e.g. hatsune_miku, Hatsune Miku, ๅˆ้ŸณใƒŸใ‚ฏ
response_format string โŒ markdown markdown or json

LLM Tip: Call this before deepghs_generate_waifuc_script to confirm the correct Danbooru/Pixiv tag format for a character.


deepghs_find_character_dataset

Searches deepghs and CyberHarem on HuggingFace for pre-built character datasets. CyberHarem datasets are built with the full automated pipeline: crawl โ†’ CCIP filter โ†’ WD14 tag โ†’ upload.

Parameters

Parameter Type Required Default Description
character_name string โœ… โ€” Character name, e.g. Rem, Hatsune Miku, surtr arknights
response_format string โŒ markdown markdown or json

deepghs_generate_waifuc_script

Generates a complete, annotated Python pipeline script using waifuc.

Pipeline actions included (in order):

Action What it does Why it matters for LoRA
ModeConvertAction Convert to RGB, white background Standardizes input format
NoMonochromeAction Drop greyscale/sketch images Prevents style contamination
ClassFilterAction Keep illustration/anime only Drops manga panels and 3D
RatingFilterAction Filter by content rating Keep dataset SFW if needed
FilterSimilarAction Deduplicate similar images Prevents overfitting to variants
FaceCountAction Require exactly 1 face Removes group shots and objects
PersonSplitAction Crop each character from group images Maximizes usable data
CCIPAction AI identity verification Removes wrong characters (the most important step)
TaggingAction WD14 auto-tagging Generates .txt captions automatically
AlignMinSizeAction Resize to minimum resolution Ensures quality floor
PaddingAlignAction Pad to square Standard training resolution

Parameters

Parameter Type Required Default Description
character_name string โœ… โ€” Display name, e.g. Rem, Surtr
danbooru_tag string โŒ auto-guessed Danbooru tag, e.g. rem_(re:zero)
pixiv_query string โŒ โ€” Pixiv search query, Japanese preferred. Required if pixiv in sources
sources list โŒ ["danbooru"] danbooru, pixiv, gelbooru, zerochan, sankaku, auto
model_format string โŒ sd1.5 sd1.5 (512px), sdxl (1024px), or flux (1024px)
content_rating string โŒ safe safe, safe_r15, or all
output_dir string โŒ ./dataset_output Output directory
max_images integer โŒ no limit Cap total images collected
pixiv_token string โŒ โ€” Pixiv refresh token (required for Pixiv source)

Source tag formats:

Source Tag Format Notes
danbooru rem_(re:zero) snake_case with series in parens
gelbooru rem_(re:zero) same as Danbooru
pixiv ใƒฌใƒ  / rem re:zero Japanese preferred for better results
zerochan Rem Title Case, strict mode enabled
sankaku rem_(re:zero) snake_case
auto character name uses gchar database โ€” best for game characters

deepghs_generate_cheesechaser_script

Generates a Python download script using cheesechaser to pull specific images from indexed tar datasets without downloading entire archives.

Parameters

Parameter Type Required Default Description
repo_id string โœ… โ€” HF dataset, e.g. deepghs/danbooru2024
output_dir string โŒ ./downloads Local download directory
post_ids list[int] โŒ โ€” Specific post IDs. If omitted, downloads all (can be very large)
max_workers integer โŒ 4 Parallel download threads (1โ€“16)

๐Ÿ—‚๏ธ Key DeepGHS Datasets

Repo ID Description Use Case
deepghs/danbooru2024 Full Danbooru archive, 8M+ images Bulk downloads, data mining
deepghs/danbooru2024-webp-4Mpixel Compressed WebP version Faster downloads
deepghs/sankaku_full Full Sankaku Channel dataset Alternative tag ecosystem
deepghs/gelbooru-webp-4Mpixel Gelbooru compressed Western fanart coverage
deepghs/site_tags 2.5M+ tags, 18 platforms Tag normalization
deepghs/anime_face_detection YOLO face detection labels Train detection models
deepghs/bangumibase Character frames from anime Character dataset bootstrapping

๐Ÿ”„ Recommended Workflows

Workflow A: Use a pre-built dataset

1. deepghs_find_character_dataset   โ†’ check if dataset exists
2. deepghs_get_repo_info            โ†’ inspect file sizes
3. deepghs_generate_cheesechaser_script โ†’ get download command
4. Run the script โ†’ train

Workflow B: Build a new dataset from scratch

1. deepghs_search_tags              โ†’ find the correct Danbooru tag
2. deepghs_generate_waifuc_script   โ†’ generate full pipeline script
3. Run the script                   โ†’ crawl, filter, tag, crop
4. Review output                    โ†’ remove any remaining noise
5. Train with kohya_ss

Workflow C: Mine specific images from a large dataset

1. deepghs_get_repo_info            โ†’ inspect dataset Parquet structure
2. deepghs_search_tags              โ†’ confirm tag names
3. deepghs_generate_cheesechaser_script โ†’ generate Parquet-filter script
4. Run the script                   โ†’ downloads only matching images

๐Ÿค– Notes for LLMs

  • Check pre-built first: Always call deepghs_find_character_dataset before generating a waifuc script. CyberHarem has hundreds of ready-to-use LoRA datasets.
  • Danbooru tag format: character_(series) with underscores โ€” rem_(re:zero), not Rem (Re:Zero). Use deepghs_search_tags to confirm.
  • File sizes: Datasets like danbooru2024 are multi-TB. Always check deepghs_get_repo_info before recommending a full download. Use cheesechaser with post IDs for targeted access.
  • CCIP is essential: It's the most important pipeline step โ€” without it, 20โ€“40% of a character dataset will be wrong-character noise. Always include it in waifuc scripts.
  • Pixiv source: Requires a pixiv_token. If the user hasn't set one up, suggest Danbooru + Gelbooru instead.
  • Model format crop sizes: SD1.5 = 512ร—512, SDXL/Flux = 1024ร—1024. This controls AlignMinSizeAction and PaddingAlignAction in generated scripts.

โš ๏ธ Known Limitations

  • HuggingFace rate limits: Without HF_TOKEN, the Hub API may throttle requests on heavy usage.
  • Gated datasets: Some datasets require explicit approval on HuggingFace before downloading. The server returns a clear error with guidance.
  • CyberHarem search: Niche characters may need manual browsing at huggingface.co/CyberHarem.
  • waifuc runtime: The generated scripts require waifuc installed separately (not in this server's deps). First run downloads ~500MB of CCIP + WD14 model weights.

๐Ÿ› Troubleshooting

Server won't start:

  • Ensure Python 3.10+: python --version
  • Re-run the installer: bash install.sh

Rate limit / 429 errors:

403 / Forbidden on a dataset:

  • The dataset is gated โ€” visit the dataset page on HuggingFace and click "Request Access"
  • Ensure the HF_TOKEN is from an account that has been granted access

Character dataset not found:

  • Try alternate spellings: "Rem", "rem_(re:zero)", "rem re:zero"
  • Browse manually: huggingface.co/CyberHarem
  • Generate from scratch with deepghs_generate_waifuc_script

waifuc script fails:

  • Install waifuc: pip install git+https://github.com/deepghs/waifuc.git
  • GPU support: pip install "waifuc[gpu]" (much faster CCIP + tagging)
  • First run downloads ~500MB of model weights โ€” this is expected

๐Ÿค Contributing

Pull requests are welcome! If a tool is returning incorrect data, a script template is outdated, or a new DeepGHS dataset or model should be highlighted, please open an issue or PR.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

๐Ÿ“„ License

MIT License โ€” see LICENSE for details.


๐Ÿ”— Links

Related MCP Servers

  • gelbooru-mcp โ€” Search Gelbooru, generate SD prompts from character tag data
  • zerochan-mcp โ€” Browse Zerochan's high-quality anime image board

<!-- mcp-name: io.github.citronlegacy/deepghs-mcp -->

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured