Bailian Voice Clone MCP
Enables voice cloning and speech synthesis through Alibaba Cloud's Bailian and DashScope platforms. It provides tools to create, manage, and synthesize audio using custom cloned voice profiles.
README
Bailian Voice Clone MCP
一个可部署到阿里云 Function AI 的 stdio MCP,用于:
- 创建声音克隆
- 轮询音色状态
- 查询单个音色
- 列出音色
- 删除音色
- 用复刻音色做语音合成
本地启动
- 安装依赖
Path B:部署到阿里云 Function AI
1. 准备代码仓库
把这个目录推到 GitHub 或阿里云 Codeup:
server.pyrequirements.txt.env.exampleREADME.md
2. 在 Function AI 创建 MCP 服务
- 登录 Function AI 控制台
- 创建空白项目
- 新建服务,选择
MCP 服务 - 传输类型选择
SSE - 开启鉴权
- 运行环境选择
Python - 绑定你的代码仓库
3. 配置构建和启动
建议值:
- 构建命令:
pip install -t . -r requirements.txt - 启动命令:
python server.py
资源建议:
- vCPU:1
- 内存:2 GB
- 弹性策略:
极速模式 - 预置快照:
1 - 实例上限:
1
4. 配置环境变量
在 Function AI 的变量管理里新增:
DASHSCOPE_API_KEYDASHSCOPE_REGION=cn-beijingBAILIAN_TTS_MODEL=cosyvoice-v3.5-plusINLINE_AUDIO_BASE64_LIMIT=300000
5. 部署并测试
部署成功后,Function AI 会给你一个公网 SSE 地址,通常是:
https://xxxx.cn-beijing.fcapp.run/sse
先在 Function AI 控制台直接测试工具是否可用。
注册到百炼 MCP 管理
- 打开百炼控制台 -> MCP 管理 -> 自定义服务
- 点击
+创建 MCP 服务 - 选择
使用脚本部署 - 安装方式选
http - 填入你的 SSE 地址
配置示例:
{
"mcpServers": {
"voice-clone-mcp": {
"url": "https://xxxx.cn-beijing.fcapp.run/sse"
}
}
}
使用顺序建议
- 调
create_voice_clone - 调
wait_for_voice_ready - 状态变成
OK后,调synthesize_with_cloned_voice
示例参数
创建声音克隆
{
"audio_url": "https://your-public-audio-url/sample.wav",
"prefix": "myvoice01",
"language_hint": "zh",
"target_model": "cosyvoice-v3.5-plus",
"region": "cn-beijing"
}
合成语音
{
"text": "你好,这是一段使用复刻音色生成的演示语音。",
"voice_id": "cosyvoice-v3.5-plus-myvoice01-xxxxxxxx",
"target_model": "cosyvoice-v3.5-plus",
"region": "cn-beijing",
"inline_base64": true
}
注意事项
- 声音克隆和声音合成用的
target_model必须一致,否则合成会失败 audio_url必须公网可访问prefix建议只用小写字母、数字、下划线,长度不超过 10synthesize_with_cloned_voice默认会把音频落到临时目录;在云端想长期保存,下一步建议接 OSS
Local Recording Support
The MCP now supports two additional tools for local recordings:
create_qwen_voice_clone_from_audio_base64create_qwen_voice_clone_from_local_filecreate_qwen_voice_clone_from_video_url_segmentcreate_qwen_voice_clone_from_local_video_segment
How to choose:
- If you deploy the MCP to Function AI / Bailian, use
create_qwen_voice_clone_from_audio_base64. This is the remote-friendly path because you can pass audio as base64 or a full Data URL. - If you run the MCP locally with
stdio, usecreate_qwen_voice_clone_from_local_file. - If the voice is inside a video, use one of the video segment tools and specify the exact start/end time of the speaker you want to clone.
Important:
CosyVoiceclone tools still require a publicaudio_url.- Direct local-file clone support is implemented with
Qwen3 TTS VC, because the official Qwen voice enrollment API supportsaudio.datawhile the CosyVoice clone API is documented around public URL input.
Example for remote base64 mode:
{
"audio_base64_or_data_url": "data:audio/wav;base64,AAA...",
"preferred_name": "demo_voice_01",
"audio_mime_type": "audio/wav",
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Example for local file mode:
{
"local_file_path": "C:\\Users\\29932\\Desktop\\sample.wav",
"preferred_name": "demo_voice_01",
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Example for video URL mode:
{
"video_url": "https://your-public-video-url/demo.mp4",
"preferred_name": "demo_voice_01",
"start_time": "00:01:15",
"end_time": "00:01:42",
"speech_enhancement": false,
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Example for local video mode:
{
"local_video_path": "C:\\Users\\29932\\Desktop\\demo.mp4",
"preferred_name": "demo_voice_01",
"start_time": "75",
"end_time": "102",
"speech_enhancement": false,
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing"
}
Video notes:
start_timeandend_timesupportsecondsorHH:MM:SS[.ms]- Video extraction now keeps
24000 Hzmono WAV by default to preserve more timbre detail for cloning speech_enhancement=falseis now the safer default when similarity matters most- Turn
speech_enhancement=trueon only when the source clip is noisy enough that intelligibility matters more than timbre fidelity - For best cloning quality, choose a
10-20ssegment where the target speaker is clear, continuous, and background music is as weak as possible
Workflow By Clone Type
Use different follow-up steps for the two API families in this MCP:
-
create_voice_clone: This is the CosyVoicevoice-enrollmentflow. It is asynchronous. After creation, callwait_for_voice_readyorquery_voice, then callsynthesize_with_cloned_voice. -
create_qwen_voice_clone_from_audio_base64 -
create_qwen_voice_clone_from_local_file -
create_qwen_voice_clone_from_video_url_segment -
create_qwen_voice_clone_from_local_video_segmentThese are Qwen voice clone flows. They are ready for synthesis immediately after the create call returns success. Do not callquery_voice,wait_for_voice_ready,list_voices, ordelete_voicewith a Qwen voice id such asqwen-tts-vc-.... Callsynthesize_with_cloned_voicedirectly with the returnedvoice_idandtarget_model.
Qwen follow-up example:
{
"text": "时光如白驹过隙,转瞬即逝。",
"voice_id": "qwen-tts-vc-demo_voice_01-voice-20260323xxxx",
"target_model": "qwen3-tts-vc-2026-01-22",
"region": "cn-beijing",
"inline_base64": true
}
LobeHub HTTP Mode
LobeHub expects Streamable HTTP, not SSE.
This project now supports both transports:
MCP_TRANSPORT=stdioFor local stdio use or Function AI MCP proxy mode.MCP_TRANSPORT=streamable-httpFor direct LobeHub integration.
Recommended environment variables for direct LobeHub deployment:
MCP_TRANSPORT=streamable-http
MCP_HOST=0.0.0.0
MCP_PORT=8080
Startup command for HTTP mode:
python server.py
LobeHub example config:
{
"mcpServers": {
"voice-clone-mcp": {
"url": "https://your-domain.example.com/mcp",
"type": "streamable-http",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
Important:
- For LobeHub, use the
/mcpHTTP URL of the deployed service, not the old/sseURL. - If you deploy this mode to Function AI, use a normal HTTP/Web service style deployment that exposes port
8080.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.