embodied_arm_mcp
MCP server for voice-controlled 6DOF robotic arm, exposing tools like move_to, gripper_open, gripper_close, record_motion, and replay_motion.
README
embodied_arm_mcp
基于 ROS2 Jazzy + faster-whisper + MiniMax M3 + MCP 协议构建的语音大模型控制机械臂项目(Day 5 v0.1)。
Day 5 完成语音+LLM 全链路;Day 6 接入 MCP Server + 6DOF 机械臂仿真;Day 7 录视频 + 简历 v2.0。
🎯 项目目标
搭建语音大模型控制机械臂的端到端链路——说话 → Whisper 语音识别 → MiniMax M3 语义理解 → MCP 协议调用 → ROS2 机械臂执行。这是 2026 具身智能(Embodied AI)招聘风口的核心技术链路。
为什么是主推:
- 5 大 2026 风口词全打:具身智能 / Embodied AI / VLA 模型 / MCP 协议 / 数字孪生
- 大模型 + 机器人跨界,技术稀缺性极强
- 真实场景:人 → Agent → 机器人,端到端全栈
- 与 Day 3-4 避障项目互补(移动底盘感知决策 vs 机械臂 + LLM 上层智能)
🏗️ 架构(Day 5 阶段)
┌────────────────────────────────────────────────────┐
│ 语音输入层 │
│ ┌────────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ 录音 (pyaudio)│─►│ Whisper STT │─►│ MiniMax M3 │ │
│ │ .wav 16kHz │ │ (faster- │ │ (OpenAI 兼容)│ │
│ └────────────┘ │ whisper) │ │ → tool_call │ │
│ └────────────┘ └──────────────┘ │
│ │ │
│ ┌────────────┐ │ │
│ │ Pyttsx3 / │◄────────────────────────────┘ │
│ │ EdgeTTS │ │
│ └────────────┘ │
└────────────────────────────────────────────────────┘
Day 6 + 7 扩展(计划中):
- MCP Server 暴露 5 个工具:
move_to/gripper_open/gripper_close/record_motion/replay_motion - ROS2 Jazzy + MoveIt2 + 6DOF 机械臂(panda 仿真)
- 端到端:「移动到 (0.3, 0, 0.2)」→ 机械臂动起来
🛠️ 环境
- OS:WSL2 Ubuntu 24.04
- ROS2:Jazzy Jalisco
- Python:3.12(系统)+ venv 隔离
- LLM:MiniMax M3(OpenAI 兼容协议)
- STT:faster-whisper base 模型(CPU 实时)
🚀 运行步骤
1. 克隆 + 装依赖
git clone https://github.com/JakLiao/embodied_arm_mcp.git
cd embodied_arm_mcp
# 装 ROS2 Jazzy + 必要系统包(Day 5 不用 sudo,但 Day 6 需要)
# sudo apt install ros-jazzy-moveit ros-jazzy-moveit-py portaudio19-dev
# 建 venv + 装 Python 依赖
cd ..
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd src/embodied_arm_mcp
2. 配置 MiniMax API Key
# ~/.bashrc 第 9 行 export MINIMAX_API_KEY=...
source ~/.bashrc
echo $MINIMAX_API_KEY # 应有值
3. 编译 ROS2 包
cd ~/embodied_arm_mcp_ws
source /opt/ros/jazzy/setup.bash
source install/setup.bash
colcon build --packages-select embodied_arm_mcp --merge-install
4. 运行各模块
# 4.1 Whisper STT(用 JFK 测试音频)
wget https://github.com/openai/whisper/raw/main/tests/jfk.flac -O /tmp/jfk.flac
python -c "import soundfile; d,sr=soundfile.read('/tmp/jfk.flac'); soundfile.write('/tmp/test.wav',d,sr,subtype='PCM_16')"
whisper_stt --audio /tmp/test.wav --model base --language en
# 期望:And so my fellow Americans, ask not what your country can do for you...
# 4.2 TTS(Pyttsx3 / EdgeTTS)
tts --text "你好 MiniMax M3" --engine edge --output /tmp/test.mp3
# 4.3 LLM Agent(MiniMax M3)
llm_agent --text "用一句话介绍你自己"
# 4.4 端到端语音 → LLM 管线
voice_pipeline --audio /tmp/test.wav --no-speak --language en
📊 性能基准(Day 5 实测)
| 链路 | 延迟 | 备注 |
|---|---|---|
| Whisper base STT(11s 音频) | 0.5s | CPU 实时 |
| MiniMax M3 单轮对话 | 2.8-6.5s | 含网络 + 思考 |
| TTS(EdgeTTS 中文) | 0.5s | 仅生成 mp3,不含播放 |
| 端到端(不含录音) | ~3-7s | STT + LLM + TTS |
📁 文件结构
embodied_arm_mcp_ws/
├── README.md
├── .gitignore
├── venv/ # Python 虚拟环境
├── src/embodied_arm_mcp/
│ ├── package.xml
│ ├── setup.py
│ ├── setup.cfg
│ ├── resource/embodied_arm_mcp
│ └── embodied_arm_mcp/
│ ├── __init__.py
│ ├── audio_recorder.py # pyaudio 录音
│ ├── whisper_stt.py # faster-whisper STT
│ ├── tts.py # Pyttsx3 + EdgeTTS
│ ├── llm_agent.py # MiniMax M3 (OpenAI 兼容)
│ └── voice_llm_pipeline.py # STT + LLM 端到端
└── install/ # colcon build 输出
⚠️ 踩坑汇总
- WSL2 无音频设备:
pyaudio找不到设备,arecord -l看不到。Day 5 走方案 C(用现成 .wav)。Day 7 录视频时用 Windows 端录音 + 共享文件。 - pyaudio 安装失败:需要
sudo apt install portaudio19-dev(Day 5 跳过,Day 6 可能需要)。 - Pyttsx3 缺 eSpeak:WSL2 默认无 eSpeak,需
sudo apt install espeak-ng(Day 5 跳过,用 EdgeTTS 替代)。 - MiniMax API Key:必须
source ~/.bashrc才能拿到环境变量(非交互 shell 不读 .bashrc)。 - pip 装包慢:默认 pypi 在国内极慢(23kB/s),用清华镜像
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple。 - ros2 run 找不到 venv 包:
entry_points装在系统 Python,运行前需export PYTHONPATH=$PWD/venv/lib/python3.12/site-packages:$PYTHONPATH。
📚 参考资料
| 资源 | 链接 |
|---|---|
| 黑马 BV131ZuBdEMZ P56-P63 | https://www.bilibili.com/video/BV131ZuBdEMZ |
| faster-whisper | https://github.com/SYSTRAN/faster-whisper |
| MiniMax API | https://api.minimaxi.com |
| MCP 协议 | https://modelcontextprotocol.io/ |
| MoveIt2 文档 | https://moveit.picknik.ai/main/index.html |
📝 License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.