Hugging Face MCP Server

Hugging Face MCP Server

Enables access to 200,000+ machine learning models through the Hugging Face Inference API. Supports text generation, image creation, classification, translation, speech processing, embeddings, and more AI tasks.

Category
Visit Server

README

Hugging Face MCP Server

MCP server for accessing the Hugging Face Inference API. Run 200,000+ machine learning models including LLMs, image generation, text classification, embeddings, and more.

Features

  • Text Generation: LLMs like Llama-3, Mistral, Gemma
  • Image Generation: FLUX, Stable Diffusion XL, SD 2.1
  • Text Classification: Sentiment analysis, topic classification
  • Token Classification: Named entity recognition, POS tagging
  • Question Answering: Extract answers from context
  • Summarization: Condense long text
  • Translation: 200+ language pairs
  • Image-to-Text: Image captioning
  • Image Classification: Classify images into categories
  • Object Detection: Detect objects with bounding boxes
  • Text-to-Speech: Convert text to audio
  • Speech Recognition: Transcribe audio (Whisper)
  • Embeddings: Get text/sentence embeddings
  • And more: Fill-mask, sentence similarity

Setup

Prerequisites

  • Hugging Face account
  • API token (free or Pro)

Environment Variables

  • HUGGINGFACE_API_TOKEN (required): Your Hugging Face API token

How to get an API token:

  1. Go to huggingface.co/settings/tokens
  2. Click "New token"
  3. Give it a name and select permissions (read is sufficient for inference)
  4. Copy the token (starts with hf_)
  5. Store as HUGGINGFACE_API_TOKEN

Available Tools

Text Generation Tools

text_generation

Generate text using large language models.

Parameters:

  • prompt (string, required): Input text prompt
  • model_id (string, optional): Model ID (default: 'mistralai/Mistral-7B-Instruct-v0.3')
  • max_new_tokens (int, optional): Maximum tokens to generate
  • temperature (float, optional): Sampling temperature 0-2 (higher = more random)
  • top_p (float, optional): Nucleus sampling 0-1
  • top_k (int, optional): Top-k sampling
  • repetition_penalty (float, optional): Penalty for repetition
  • return_full_text (bool, optional): Return prompt + generation (default: False)

Popular models:

  • meta-llama/Llama-3.2-3B-Instruct - Meta's Llama 3.2
  • mistralai/Mistral-7B-Instruct-v0.3 - Mistral 7B
  • google/gemma-2-2b-it - Google Gemma 2
  • HuggingFaceH4/zephyr-7b-beta - Zephyr 7B
  • tiiuae/falcon-7b-instruct - Falcon 7B

Example:

result = await text_generation(
    prompt="Write a Python function to calculate fibonacci numbers:",
    model_id="mistralai/Mistral-7B-Instruct-v0.3",
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9
)

Classification Tools

text_classification

Classify text into categories (sentiment, topics, etc.).

Parameters:

  • text (string, required): Text to classify
  • model_id (string, optional): Model ID (default: 'distilbert-base-uncased-finetuned-sst-2-english')

Popular models:

  • distilbert-base-uncased-finetuned-sst-2-english - Sentiment (positive/negative)
  • facebook/bart-large-mnli - Zero-shot classification
  • cardiffnlp/twitter-roberta-base-sentiment-latest - Twitter sentiment
  • finiteautomata/bertweet-base-sentiment-analysis - Tweet sentiment

Example:

result = await text_classification(
    text="I love this product! It exceeded my expectations.",
    model_id="distilbert-base-uncased-finetuned-sst-2-english"
)
# Returns: [{'label': 'POSITIVE', 'score': 0.9998}]

token_classification

Token-level classification for NER, POS tagging, etc.

Parameters:

  • text (string, required): Input text
  • model_id (string, optional): Model ID (default: 'dslim/bert-base-NER')

Popular models:

  • dslim/bert-base-NER - Named Entity Recognition
  • Jean-Baptiste/roberta-large-ner-english - Large NER model
  • dbmdz/bert-large-cased-finetuned-conll03-english - CoNLL-2003 NER

Example:

result = await token_classification(
    text="Apple Inc. is located in Cupertino, California.",
    model_id="dslim/bert-base-NER"
)
# Returns entities: ORG (Apple Inc.), LOC (Cupertino), LOC (California)

Question Answering & Text Processing

question_answering

Answer questions based on provided context.

Parameters:

  • question (string, required): Question to answer
  • context (string, required): Context containing the answer
  • model_id (string, optional): Model ID (default: 'deepset/roberta-base-squad2')

Popular models:

  • deepset/roberta-base-squad2 - RoBERTa on SQuAD 2.0
  • distilbert-base-cased-distilled-squad - DistilBERT on SQuAD

Example:

result = await question_answering(
    question="Where is the Eiffel Tower located?",
    context="The Eiffel Tower is a landmark in Paris, France. It was built in 1889.",
    model_id="deepset/roberta-base-squad2"
)
# Returns: {'answer': 'Paris, France', 'score': 0.98, 'start': 35, 'end': 48}

summarization

Summarize long text into shorter version.

Parameters:

  • text (string, required): Text to summarize
  • model_id (string, optional): Model ID (default: 'facebook/bart-large-cnn')
  • max_length (int, optional): Maximum summary length
  • min_length (int, optional): Minimum summary length

Popular models:

  • facebook/bart-large-cnn - BART CNN summarization
  • google/pegasus-xsum - PEGASUS XSum
  • sshleifer/distilbart-cnn-12-6 - Distilled BART

Example:

result = await summarization(
    text="Long article text here...",
    model_id="facebook/bart-large-cnn",
    max_length=130,
    min_length=30
)

translation

Translate text between languages.

Parameters:

  • text (string, required): Text to translate
  • model_id (string, required): Model ID for language pair

Popular models:

  • Helsinki-NLP/opus-mt-en-es - English to Spanish
  • Helsinki-NLP/opus-mt-es-en - Spanish to English
  • Helsinki-NLP/opus-mt-en-fr - English to French
  • Helsinki-NLP/opus-mt-en-de - English to German
  • facebook/mbart-large-50-many-to-many-mmt - Multilingual (50 languages)

Example:

result = await translation(
    text="Hello, how are you?",
    model_id="Helsinki-NLP/opus-mt-en-es"
)
# Returns: "Hola, ¿cómo estás?"

Image Generation Tools

text_to_image

Generate images from text prompts.

Parameters:

  • prompt (string, required): Text description of desired image
  • model_id (string, optional): Model ID (default: 'black-forest-labs/FLUX.1-dev')
  • negative_prompt (string, optional): What to avoid in image
  • num_inference_steps (int, optional): Number of denoising steps
  • guidance_scale (float, optional): How closely to follow prompt

Popular models:

  • black-forest-labs/FLUX.1-dev - FLUX.1 (high quality)
  • stabilityai/stable-diffusion-xl-base-1.0 - SDXL
  • stabilityai/stable-diffusion-2-1 - SD 2.1
  • runwayml/stable-diffusion-v1-5 - SD 1.5

Example:

result = await text_to_image(
    prompt="A serene mountain landscape at sunset, photorealistic, 8k",
    model_id="black-forest-labs/FLUX.1-dev",
    negative_prompt="blurry, low quality, distorted",
    guidance_scale=7.5
)
# Returns: {'image': 'base64_encoded_image', 'format': 'base64'}

Computer Vision Tools

image_to_text

Generate text descriptions from images (captioning).

Parameters:

  • image_base64 (string, required): Base64 encoded image
  • model_id (string, optional): Model ID (default: 'Salesforce/blip-image-captioning-large')

Popular models:

  • Salesforce/blip-image-captioning-large - BLIP large
  • nlpconnect/vit-gpt2-image-captioning - ViT-GPT2

Example:

result = await image_to_text(
    image_base64="base64_encoded_image_data",
    model_id="Salesforce/blip-image-captioning-large"
)
# Returns: [{'generated_text': 'a dog playing in the park'}]

image_classification

Classify images into categories.

Parameters:

  • image_base64 (string, required): Base64 encoded image
  • model_id (string, optional): Model ID (default: 'google/vit-base-patch16-224')

Popular models:

  • google/vit-base-patch16-224 - Vision Transformer
  • microsoft/resnet-50 - ResNet-50

Example:

result = await image_classification(
    image_base64="base64_encoded_image_data",
    model_id="google/vit-base-patch16-224"
)
# Returns: [{'label': 'golden retriever', 'score': 0.95}, ...]

object_detection

Detect objects in images with bounding boxes.

Parameters:

  • image_base64 (string, required): Base64 encoded image
  • model_id (string, optional): Model ID (default: 'facebook/detr-resnet-50')

Popular models:

  • facebook/detr-resnet-50 - DETR with ResNet-50
  • hustvl/yolos-tiny - YOLOS tiny

Example:

result = await object_detection(
    image_base64="base64_encoded_image_data",
    model_id="facebook/detr-resnet-50"
)
# Returns: [{'label': 'dog', 'score': 0.98, 'box': {...}}, ...]

Audio Tools

text_to_speech

Convert text to speech audio.

Parameters:

  • text (string, required): Text to synthesize
  • model_id (string, optional): Model ID (default: 'facebook/mms-tts-eng')

Popular models:

  • facebook/mms-tts-eng - MMS TTS English
  • espnet/kan-bayashi_ljspeech_vits - VITS LJSpeech

Example:

result = await text_to_speech(
    text="Hello, this is a test of text to speech.",
    model_id="facebook/mms-tts-eng"
)
# Returns: {'audio': 'base64_encoded_audio', 'format': 'base64'}

automatic_speech_recognition

Transcribe audio to text (speech recognition).

Parameters:

  • audio_base64 (string, required): Base64 encoded audio
  • model_id (string, optional): Model ID (default: 'openai/whisper-large-v3')

Popular models:

  • openai/whisper-large-v3 - Whisper large v3 (best quality)
  • openai/whisper-medium - Whisper medium (faster)
  • facebook/wav2vec2-base-960h - Wav2Vec 2.0

Example:

result = await automatic_speech_recognition(
    audio_base64="base64_encoded_audio_data",
    model_id="openai/whisper-large-v3"
)
# Returns: {'text': 'transcribed audio text here'}

Embedding & Similarity Tools

sentence_similarity

Compute similarity between sentences.

Parameters:

  • source_sentence (string, required): Reference sentence
  • sentences (list, required): List of sentences to compare
  • model_id (string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')

Popular models:

  • sentence-transformers/all-MiniLM-L6-v2 - Fast, good quality
  • sentence-transformers/all-mpnet-base-v2 - Best quality
  • BAAI/bge-small-en-v1.5 - BGE small

Example:

result = await sentence_similarity(
    source_sentence="The cat sits on the mat",
    sentences=[
        "A cat is sitting on a mat",
        "The dog runs in the park",
        "Cats are great pets"
    ],
    model_id="sentence-transformers/all-MiniLM-L6-v2"
)
# Returns: [0.95, 0.23, 0.65]

feature_extraction

Get embeddings (feature vectors) for text.

Parameters:

  • text (string, required): Input text
  • model_id (string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')

Popular models:

  • sentence-transformers/all-MiniLM-L6-v2 - 384 dimensions
  • sentence-transformers/all-mpnet-base-v2 - 768 dimensions
  • BAAI/bge-small-en-v1.5 - 384 dimensions

Example:

result = await feature_extraction(
    text="This is a sample sentence.",
    model_id="sentence-transformers/all-MiniLM-L6-v2"
)
# Returns: [[0.012, -0.034, 0.056, ...]] (384-dimensional vector)

fill_mask

Fill in masked words in text.

Parameters:

  • text (string, required): Text with [MASK] token
  • model_id (string, optional): Model ID (default: 'bert-base-uncased')

Popular models:

  • bert-base-uncased - BERT base
  • roberta-base - RoBERTa base
  • distilbert-base-uncased - DistilBERT

Example:

result = await fill_mask(
    text="Paris is the [MASK] of France.",
    model_id="bert-base-uncased"
)
# Returns: [{'token_str': 'capital', 'score': 0.95}, ...]

Model Loading & Cold Starts

Important: Models may take 20-60 seconds to load on first request (cold start). Subsequent requests are faster.

Tips:

  • Use popular models for faster loading
  • Implement retry logic for timeouts
  • Consider caching model responses
  • Use smaller models for faster inference

Rate Limits

Free Tier

  • Rate limited to prevent abuse
  • Suitable for testing and small projects
  • May experience queuing during high load

Pro Subscription ($9/month)

  • No rate limits
  • Priority access to models
  • Faster inference
  • No queuing

Visit huggingface.co/pricing for details.

Base64 Encoding

For images and audio, you need to provide base64 encoded data:

Python example:

import base64

# Encode image
with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode('utf-8')

# Encode audio
with open("audio.wav", "rb") as f:
    audio_base64 = base64.b64encode(f.read()).decode('utf-8')

# Decode image response
image_bytes = base64.b64decode(response['image'])
with open("generated.jpg", "wb") as f:
    f.write(image_bytes)

Parameter Tuning

Text Generation

  • temperature (0-2): Higher = more creative/random, Lower = more focused/deterministic
  • top_p (0-1): Nucleus sampling, typically 0.9-0.95
  • top_k: Number of highest probability tokens to keep
  • repetition_penalty: Penalize repeated tokens (>1.0 reduces repetition)

Image Generation

  • guidance_scale (1-20): Higher = follows prompt more strictly (typical: 7-7.5)
  • num_inference_steps: More steps = higher quality but slower (typical: 20-50)
  • negative_prompt: Describe what you don't want in the image

Error Handling

Common errors:

  • 503 Service Unavailable: Model is loading (cold start), retry after 20-60 seconds
  • 401 Unauthorized: Invalid or missing API token
  • 429 Too Many Requests: Rate limit exceeded (upgrade to Pro)
  • 400 Bad Request: Invalid parameters or model ID
  • 504 Gateway Timeout: Model took too long to respond

Retry logic example:

import time

max_retries = 3
for attempt in range(max_retries):
    try:
        result = await text_generation(prompt="Hello")
        break
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 503 and attempt < max_retries - 1:
            time.sleep(20)  # Wait for model to load
            continue
        raise

Finding Models

Browse models:

  • Visit huggingface.co/models
  • Filter by task (Text Generation, Image Generation, etc.)
  • Sort by downloads, likes, or trending
  • Check model card for usage examples

Popular categories:

  • Text Generation: 50,000+ models
  • Text Classification: 30,000+ models
  • Image Generation: 10,000+ models
  • Translation: 5,000+ models
  • Embeddings: 3,000+ models

Best Practices

  1. Use popular models: Faster loading and better maintained
  2. Implement timeouts: Set appropriate timeouts (60-120 seconds)
  3. Cache responses: Store results to reduce API calls
  4. Handle cold starts: Implement retry logic for 503 errors
  5. Monitor usage: Track API calls and costs
  6. Test locally: Use Hugging Face Transformers library for testing
  7. Read model cards: Understand model capabilities and limitations
  8. Optimize parameters: Tune settings for your use case

Use Cases

  • Chatbots: LLM-powered conversational AI
  • Content Generation: Blog posts, articles, creative writing
  • Image Creation: Art, illustrations, product images
  • Sentiment Analysis: Customer feedback analysis
  • Translation: Multi-language support
  • Transcription: Meeting notes, podcast transcripts
  • Semantic Search: Embedding-based search
  • Data Extraction: NER for document processing
  • Content Moderation: Text and image classification

API Documentation

Support

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured