Advanced TTS MCP Server

Advanced TTS MCP Server

Provides high-quality text-to-speech synthesis with 10 natural voices, emotion control, and dynamic pacing for professional applications requiring expressive speech output.

Category
Visit Server

README

Advanced TTS MCP Server

A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.

✨ Features

🎯 Advanced Voice Control

  • 10 High-Quality Voices - Male and female voices with distinct personalities
  • Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
  • Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
  • Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume

🚀 Professional Capabilities

  • Streaming Audio - Real-time synthesis and playback
  • Batch Processing - Handle multiple text segments efficiently
  • Multiple Formats - WAV, MP3, FLAC, OGG output support
  • Natural Speech Enhancement - Automatic pause insertion and emotion markers
  • Queue Management - Handle multiple concurrent requests

🔧 MCP Integration

  • 6 Powerful Tools - Complete synthesis, batch processing, voice management
  • 2 Rich Resources - Voice capabilities and usage examples
  • Real-time Status - Track processing progress and manage requests
  • File Management - Save, list, and organize audio outputs

🚀 Quick Start

Option 1: Deploy to Smithery.ai (Recommended)

🎯 One-Click Deployment to Smithery Platform

  1. Deploy Now: Visit Smithery.ai and import this repository
  2. Configure: Set your preferred voice and speech settings
  3. Use Instantly: Access via Claude Desktop or any MCP-compatible client

Benefits:

  • ✅ Zero setup required
  • ✅ Automatic scaling and updates
  • ✅ No model downloads needed
  • ✅ Enterprise-grade hosting

📋 Full Smithery Deployment Guide →

Option 2: Local Installation

Prerequisites:

  • Node.js 18+

Installation:

  1. Clone the repository
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcp
  1. Install dependencies
npm install
  1. Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "advanced-tts": {
      "command": "node",
      "args": ["dist/index.js"],
      "cwd": "/path/to/advanced-tts-mcp"
    }
  }
}
  1. Start using!
# Build TypeScript
npm run build

# Start server
npm start

Restart Claude Desktop and start synthesizing with natural, expressive voices.

🎙️ Available Voices

Voice ID Name Gender Description
af_heart Heart Female Warm, friendly voice (default)
af_sky Sky Female Clear, bright voice
af_bella Bella Female Elegant, sophisticated voice
af_sarah Sarah Female Professional, confident voice
af_nicole Nicole Female Gentle, soothing voice
am_adam Adam Male Strong, authoritative voice
am_michael Michael Male Friendly, approachable voice
bf_emma Emma Female Young, energetic voice
bf_isabella Isabella Female Mature, expressive voice
bm_lewis Lewis Male Deep, resonant voice

📚 Usage Examples

Basic Synthesis

# Simple text-to-speech
await synthesize_speech(
    text="Hello! Welcome to Advanced TTS.",
    voice_id="af_heart"
)

Emotional Expression

# Excited announcement
await synthesize_speech(
    text="This is amazing news! You're going to love this new feature!",
    voice_id="af_heart",
    emotion="excited",
    pacing="conversational",
    speed=1.1
)

Professional Presentation

# Tutorial narration
await synthesize_speech(
    text="Step one: Open your browser. Step two: Navigate to the website.",
    voice_id="am_adam", 
    emotion="calm",
    pacing="tutorial",
    speed=0.9
)

Batch Processing

# Multiple segments with pauses
await batch_synthesize(
    segments=[
        "Welcome to our presentation.",
        "Today we'll cover three main topics.", 
        "Let's begin with the first topic."
    ],
    voice_id="af_sarah",
    emotion="confident",
    pacing="presentation",
    merge_output=True,
    segment_pause=1.0,
    save_file=True
)

🛠️ Available Tools

synthesize_speech

Convert text to natural speech with full control over voice characteristics.

Parameters:

  • text - Text to synthesize (max 10,000 chars)
  • voice_id - Voice selection (see table above)
  • speed - Speech rate (0.25-3.0)
  • emotion - Voice emotion (neutral, happy, excited, calm, serious, casual, confident)
  • pacing - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)
  • volume - Audio volume (0.1-2.0)
  • output_format - File format (wav, mp3, flac, ogg)
  • save_file - Save to file (boolean)
  • filename - Custom filename

batch_synthesize

Process multiple text segments efficiently with optional merging.

Parameters:

  • segments - List of text segments
  • merge_output - Combine into single file
  • segment_pause - Pause between segments (0.0-5.0s)
  • All synthesis parameters from above

get_voices

Retrieve complete voice information and capabilities.

get_status

Check processing status for synthesis requests.

cancel_request

Cancel active synthesis operations.

list_output_files

Browse saved audio files with metadata.

🎛️ Voice Controls

Emotions

  • Neutral - Standard, professional tone
  • Happy - Upbeat, cheerful expression
  • Excited - Enthusiastic, energetic delivery
  • Calm - Relaxed, soothing tone
  • Serious - Formal, authoritative delivery
  • Casual - Relaxed, conversational style
  • Confident - Assured, professional tone

Pacing Styles

  • Natural - Balanced, human-like rhythm
  • Conversational - Casual discussion pace
  • Presentation - Professional speaking rhythm
  • Tutorial - Educational, clear delivery
  • Narrative - Storytelling pace
  • Fast - Quick delivery (1.2x base speed)
  • Slow - Deliberate delivery (0.8x base speed)

🎵 Audio Formats

Format Quality Use Case
WAV Uncompressed Highest quality, editing
MP3 Compressed Web, streaming, sharing
FLAC Lossless Archival, high-quality storage
OGG Compressed Open source alternative

🔧 Configuration

Environment Variables

# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin

# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100

# Audio settings  
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=true

Server Configuration

config = ServerConfig(
    model_path="./kokoro-v1.0.onnx",
    voices_path="./voices-v1.0.bin", 
    output_dir="./audio_output",
    max_queue_size=100,
    enable_streaming=True,
    default_voice="af_heart"
)

🏗️ Architecture

├── src/advanced_tts/
│   ├── __init__.py          # Package initialization
│   ├── server.py            # MCP server implementation  
│   ├── engine.py            # Kokoro TTS engine wrapper
│   ├── models.py            # Data models and validation
│   └── utils.py             # Utility functions
├── pyproject.toml           # Project configuration
├── README.md               # Documentation
└── LICENSE                 # MIT License

🤝 Contributing

Contributions welcome! Areas for improvement:

  • Additional voice models
  • Real-time streaming synthesis
  • Advanced audio effects
  • Multi-language support
  • Performance optimizations

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

  • Kokoro TTS - High-quality neural voice synthesis
  • MCP Protocol - Seamless AI model integration
  • FastMCP - Efficient server framework

Developed by Sami Halawa

Transform your text into natural, expressive speech with Advanced TTS MCP Server.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured