Bailian Voice Clone MCP

Bailian Voice Clone MCP

Enables voice cloning and speech synthesis through Alibaba Cloud's Bailian and DashScope platforms. It provides tools to create, manage, and synthesize audio using custom cloned voice profiles.

Category
Visit Server

README

Bailian Voice Clone MCP

一个可部署到阿里云 Function AI 的 stdio MCP,用于:

  • 创建声音克隆
  • 轮询音色状态
  • 查询单个音色
  • 列出音色
  • 删除音色
  • 用复刻音色做语音合成

本地启动

  1. 安装依赖

Path B:部署到阿里云 Function AI

1. 准备代码仓库

把这个目录推到 GitHub 或阿里云 Codeup:

  • server.py
  • requirements.txt
  • .env.example
  • README.md

2. 在 Function AI 创建 MCP 服务

  1. 登录 Function AI 控制台
  2. 创建空白项目
  3. 新建服务,选择 MCP 服务
  4. 传输类型选择 SSE
  5. 开启鉴权
  6. 运行环境选择 Python
  7. 绑定你的代码仓库

3. 配置构建和启动

建议值:

  • 构建命令:pip install -t . -r requirements.txt
  • 启动命令:python server.py

资源建议:

  • vCPU:1
  • 内存:2 GB
  • 弹性策略:极速模式
  • 预置快照:1
  • 实例上限:1

4. 配置环境变量

在 Function AI 的变量管理里新增:

  • DASHSCOPE_API_KEY
  • DASHSCOPE_REGION=cn-beijing
  • BAILIAN_TTS_MODEL=cosyvoice-v3.5-plus
  • INLINE_AUDIO_BASE64_LIMIT=300000

5. 部署并测试

部署成功后,Function AI 会给你一个公网 SSE 地址,通常是:

https://xxxx.cn-beijing.fcapp.run/sse

先在 Function AI 控制台直接测试工具是否可用。

注册到百炼 MCP 管理

  1. 打开百炼控制台 -> MCP 管理 -> 自定义服务
  2. 点击 +创建 MCP 服务
  3. 选择 使用脚本部署
  4. 安装方式选 http
  5. 填入你的 SSE 地址

配置示例:

{
  "mcpServers": {
    "voice-clone-mcp": {
      "url": "https://xxxx.cn-beijing.fcapp.run/sse"
    }
  }
}

使用顺序建议

  1. create_voice_clone
  2. wait_for_voice_ready
  3. 状态变成 OK 后,调 synthesize_with_cloned_voice

示例参数

创建声音克隆

{
  "audio_url": "https://your-public-audio-url/sample.wav",
  "prefix": "myvoice01",
  "language_hint": "zh",
  "target_model": "cosyvoice-v3.5-plus",
  "region": "cn-beijing"
}

合成语音

{
  "text": "你好,这是一段使用复刻音色生成的演示语音。",
  "voice_id": "cosyvoice-v3.5-plus-myvoice01-xxxxxxxx",
  "target_model": "cosyvoice-v3.5-plus",
  "region": "cn-beijing",
  "inline_base64": true
}

注意事项

  • 声音克隆和声音合成用的 target_model 必须一致,否则合成会失败
  • audio_url 必须公网可访问
  • prefix 建议只用小写字母、数字、下划线,长度不超过 10
  • synthesize_with_cloned_voice 默认会把音频落到临时目录;在云端想长期保存,下一步建议接 OSS

Local Recording Support

The MCP now supports two additional tools for local recordings:

  • create_qwen_voice_clone_from_audio_base64
  • create_qwen_voice_clone_from_local_file
  • create_qwen_voice_clone_from_video_url_segment
  • create_qwen_voice_clone_from_local_video_segment

How to choose:

  • If you deploy the MCP to Function AI / Bailian, use create_qwen_voice_clone_from_audio_base64. This is the remote-friendly path because you can pass audio as base64 or a full Data URL.
  • If you run the MCP locally with stdio, use create_qwen_voice_clone_from_local_file.
  • If the voice is inside a video, use one of the video segment tools and specify the exact start/end time of the speaker you want to clone.

Important:

  • CosyVoice clone tools still require a public audio_url.
  • Direct local-file clone support is implemented with Qwen3 TTS VC, because the official Qwen voice enrollment API supports audio.data while the CosyVoice clone API is documented around public URL input.

Example for remote base64 mode:

{
  "audio_base64_or_data_url": "data:audio/wav;base64,AAA...",
  "preferred_name": "demo_voice_01",
  "audio_mime_type": "audio/wav",
  "target_model": "qwen3-tts-vc-2026-01-22",
  "region": "cn-beijing"
}

Example for local file mode:

{
  "local_file_path": "C:\\Users\\29932\\Desktop\\sample.wav",
  "preferred_name": "demo_voice_01",
  "target_model": "qwen3-tts-vc-2026-01-22",
  "region": "cn-beijing"
}

Example for video URL mode:

{
  "video_url": "https://your-public-video-url/demo.mp4",
  "preferred_name": "demo_voice_01",
  "start_time": "00:01:15",
  "end_time": "00:01:42",
  "speech_enhancement": false,
  "target_model": "qwen3-tts-vc-2026-01-22",
  "region": "cn-beijing"
}

Example for local video mode:

{
  "local_video_path": "C:\\Users\\29932\\Desktop\\demo.mp4",
  "preferred_name": "demo_voice_01",
  "start_time": "75",
  "end_time": "102",
  "speech_enhancement": false,
  "target_model": "qwen3-tts-vc-2026-01-22",
  "region": "cn-beijing"
}

Video notes:

  • start_time and end_time support seconds or HH:MM:SS[.ms]
  • Video extraction now keeps 24000 Hz mono WAV by default to preserve more timbre detail for cloning
  • speech_enhancement=false is now the safer default when similarity matters most
  • Turn speech_enhancement=true on only when the source clip is noisy enough that intelligibility matters more than timbre fidelity
  • For best cloning quality, choose a 10-20s segment where the target speaker is clear, continuous, and background music is as weak as possible

Workflow By Clone Type

Use different follow-up steps for the two API families in this MCP:

  • create_voice_clone: This is the CosyVoice voice-enrollment flow. It is asynchronous. After creation, call wait_for_voice_ready or query_voice, then call synthesize_with_cloned_voice.

  • create_qwen_voice_clone_from_audio_base64

  • create_qwen_voice_clone_from_local_file

  • create_qwen_voice_clone_from_video_url_segment

  • create_qwen_voice_clone_from_local_video_segment These are Qwen voice clone flows. They are ready for synthesis immediately after the create call returns success. Do not call query_voice, wait_for_voice_ready, list_voices, or delete_voice with a Qwen voice id such as qwen-tts-vc-.... Call synthesize_with_cloned_voice directly with the returned voice_id and target_model.

Qwen follow-up example:

{
  "text": "时光如白驹过隙,转瞬即逝。",
  "voice_id": "qwen-tts-vc-demo_voice_01-voice-20260323xxxx",
  "target_model": "qwen3-tts-vc-2026-01-22",
  "region": "cn-beijing",
  "inline_base64": true
}

LobeHub HTTP Mode

LobeHub expects Streamable HTTP, not SSE.

This project now supports both transports:

  • MCP_TRANSPORT=stdio For local stdio use or Function AI MCP proxy mode.
  • MCP_TRANSPORT=streamable-http For direct LobeHub integration.

Recommended environment variables for direct LobeHub deployment:

MCP_TRANSPORT=streamable-http
MCP_HOST=0.0.0.0
MCP_PORT=8080

Startup command for HTTP mode:

python server.py

LobeHub example config:

{
  "mcpServers": {
    "voice-clone-mcp": {
      "url": "https://your-domain.example.com/mcp",
      "type": "streamable-http",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN"
      }
    }
  }
}

Important:

  • For LobeHub, use the /mcp HTTP URL of the deployed service, not the old /sse URL.
  • If you deploy this mode to Function AI, use a normal HTTP/Web service style deployment that exposes port 8080.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured