HF Dataset MCP

HF Dataset MCP

An MCP server for the Hugging Face Dataset Viewer API that enables searching, fetching, and filtering datasets on the Hugging Face Hub. It allows users to explore schemas, perform full-text searches, and analyze dataset statistics through natural language.

Category
Visit Server

README

HF Dataset MCP

MCP server for the Hugging Face Dataset Viewer API. Search datasets, fetch rows, filter data, and more.

Installation

npx @cfahlgren1/hf-dataset-mcp

Configuration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "hf-datasets": {
      "command": "npx",
      "args": ["-y", "@cfahlgren1/hf-dataset-mcp"],
      "env": {
        "HF_TOKEN": "hf_..."
      }
    }
  }
}

Environment Variables

Variable Description
HF_TOKEN Hugging Face API token (required for private/gated datasets)
HF_DATASETS_SERVER Custom Dataset Viewer API URL (default: https://datasets-server.huggingface.co)

Tools

search_datasets

Find datasets on the Hugging Face Hub by name, tag, or author.

search_datasets(search?: string, author?: string, filter?: string[], sort?: string, limit?: number)

validate_dataset

Check if a dataset is accessible and which viewer features are available.

validate_dataset(dataset: string)

list_splits

Get all available configurations and splits for a dataset.

list_splits(dataset: string)

get_dataset_info

Get the schema, metadata, and row counts for a dataset configuration.

get_dataset_info(dataset: string, config: string)

get_rows

Fetch a slice of rows from a dataset split.

get_rows(dataset: string, config: string, split: string, offset?: number, length?: number)

search_dataset

Full-text search within a dataset split using BM25 ranking.

search_dataset(dataset: string, config: string, split: string, query: string, offset?: number, length?: number)

filter_rows

Filter dataset rows using SQL-like WHERE conditions.

filter_rows(dataset: string, config: string, split: string, where: string, orderby?: string, offset?: number, length?: number)

WHERE syntax: Column names in double quotes, strings in single quotes. Supports =, <>, >, <, >=, <=, AND, OR, NOT.

Example: "label"=1 AND "text" LIKE '%hello%'

get_dataset_size

Get row counts and byte sizes for all configs and splits.

get_dataset_size(dataset: string)

list_parquet_files

Get URLs for the dataset's Parquet files for direct download or processing.

list_parquet_files(dataset: string)

get_statistics

Get descriptive statistics for each column in a dataset split.

get_statistics(dataset: string, config: string, split: string)

Examples

Find text classification datasets

search_datasets(filter: ["task_categories:text-classification"], sort: "downloads", limit: 10)

Get IMDB dataset info

list_splits(dataset: "stanfordnlp/imdb")
get_dataset_info(dataset: "stanfordnlp/imdb", config: "plain_text")

Fetch rows from a dataset

get_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", offset: 0, length: 10)

Search for specific content

search_dataset(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", query: "amazing movie")

Filter rows

filter_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", where: "\"label\"=1", length: 10)

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured