embodied_arm_mcp

embodied_arm_mcp

MCP server for voice-controlled 6DOF robotic arm, exposing tools like move_to, gripper_open, gripper_close, record_motion, and replay_motion.

Category
Visit Server

README

embodied_arm_mcp

基于 ROS2 Jazzy + faster-whisper + MiniMax M3 + MCP 协议构建的语音大模型控制机械臂项目(Day 5 v0.1)。

Day 5 完成语音+LLM 全链路;Day 6 接入 MCP Server + 6DOF 机械臂仿真;Day 7 录视频 + 简历 v2.0。


🎯 项目目标

搭建语音大模型控制机械臂的端到端链路——说话 → Whisper 语音识别 → MiniMax M3 语义理解 → MCP 协议调用 → ROS2 机械臂执行。这是 2026 具身智能(Embodied AI)招聘风口的核心技术链路。

为什么是主推

  • 5 大 2026 风口词全打:具身智能 / Embodied AI / VLA 模型 / MCP 协议 / 数字孪生
  • 大模型 + 机器人跨界,技术稀缺性极强
  • 真实场景:人 → Agent → 机器人,端到端全栈
  • 与 Day 3-4 避障项目互补(移动底盘感知决策 vs 机械臂 + LLM 上层智能)

🏗️ 架构(Day 5 阶段)

┌────────────────────────────────────────────────────┐
│                  语音输入层                         │
│  ┌────────────┐  ┌────────────┐  ┌──────────────┐  │
│  │ 录音 (pyaudio)│─►│ Whisper STT │─►│ MiniMax M3  │  │
│  │ .wav 16kHz  │  │ (faster-    │  │ (OpenAI 兼容)│  │
│  └────────────┘  │  whisper)   │  │ → tool_call  │  │
│                  └────────────┘  └──────────────┘  │
│                                            │       │
│  ┌────────────┐                             │       │
│  │ Pyttsx3 /  │◄────────────────────────────┘       │
│  │ EdgeTTS    │                                       │
│  └────────────┘                                       │
└────────────────────────────────────────────────────┘

Day 6 + 7 扩展(计划中):

  • MCP Server 暴露 5 个工具:move_to / gripper_open / gripper_close / record_motion / replay_motion
  • ROS2 Jazzy + MoveIt2 + 6DOF 机械臂(panda 仿真)
  • 端到端:「移动到 (0.3, 0, 0.2)」→ 机械臂动起来

🛠️ 环境

  • OS:WSL2 Ubuntu 24.04
  • ROS2:Jazzy Jalisco
  • Python:3.12(系统)+ venv 隔离
  • LLM:MiniMax M3(OpenAI 兼容协议)
  • STT:faster-whisper base 模型(CPU 实时)

🚀 运行步骤

1. 克隆 + 装依赖

git clone https://github.com/JakLiao/embodied_arm_mcp.git
cd embodied_arm_mcp

# 装 ROS2 Jazzy + 必要系统包(Day 5 不用 sudo,但 Day 6 需要)
# sudo apt install ros-jazzy-moveit ros-jazzy-moveit-py portaudio19-dev

# 建 venv + 装 Python 依赖
cd ..
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd src/embodied_arm_mcp

2. 配置 MiniMax API Key

# ~/.bashrc 第 9 行 export MINIMAX_API_KEY=...
source ~/.bashrc
echo $MINIMAX_API_KEY  # 应有值

3. 编译 ROS2 包

cd ~/embodied_arm_mcp_ws
source /opt/ros/jazzy/setup.bash
source install/setup.bash
colcon build --packages-select embodied_arm_mcp --merge-install

4. 运行各模块

# 4.1 Whisper STT(用 JFK 测试音频)
wget https://github.com/openai/whisper/raw/main/tests/jfk.flac -O /tmp/jfk.flac
python -c "import soundfile; d,sr=soundfile.read('/tmp/jfk.flac'); soundfile.write('/tmp/test.wav',d,sr,subtype='PCM_16')"
whisper_stt --audio /tmp/test.wav --model base --language en
# 期望:And so my fellow Americans, ask not what your country can do for you...

# 4.2 TTS(Pyttsx3 / EdgeTTS)
tts --text "你好 MiniMax M3" --engine edge --output /tmp/test.mp3

# 4.3 LLM Agent(MiniMax M3)
llm_agent --text "用一句话介绍你自己"

# 4.4 端到端语音 → LLM 管线
voice_pipeline --audio /tmp/test.wav --no-speak --language en

📊 性能基准(Day 5 实测)

链路 延迟 备注
Whisper base STT(11s 音频) 0.5s CPU 实时
MiniMax M3 单轮对话 2.8-6.5s 含网络 + 思考
TTS(EdgeTTS 中文) 0.5s 仅生成 mp3,不含播放
端到端(不含录音) ~3-7s STT + LLM + TTS

📁 文件结构

embodied_arm_mcp_ws/
├── README.md
├── .gitignore
├── venv/                                    # Python 虚拟环境
├── src/embodied_arm_mcp/
│   ├── package.xml
│   ├── setup.py
│   ├── setup.cfg
│   ├── resource/embodied_arm_mcp
│   └── embodied_arm_mcp/
│       ├── __init__.py
│       ├── audio_recorder.py                # pyaudio 录音
│       ├── whisper_stt.py                   # faster-whisper STT
│       ├── tts.py                           # Pyttsx3 + EdgeTTS
│       ├── llm_agent.py                     # MiniMax M3 (OpenAI 兼容)
│       └── voice_llm_pipeline.py            # STT + LLM 端到端
└── install/                                 # colcon build 输出

⚠️ 踩坑汇总

  1. WSL2 无音频设备pyaudio 找不到设备,arecord -l 看不到。Day 5 走方案 C(用现成 .wav)。Day 7 录视频时用 Windows 端录音 + 共享文件。
  2. pyaudio 安装失败:需要 sudo apt install portaudio19-dev(Day 5 跳过,Day 6 可能需要)。
  3. Pyttsx3 缺 eSpeak:WSL2 默认无 eSpeak,需 sudo apt install espeak-ng(Day 5 跳过,用 EdgeTTS 替代)。
  4. MiniMax API Key:必须 source ~/.bashrc 才能拿到环境变量(非交互 shell 不读 .bashrc)。
  5. pip 装包慢:默认 pypi 在国内极慢(23kB/s),用清华镜像 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple
  6. ros2 run 找不到 venv 包entry_points 装在系统 Python,运行前需 export PYTHONPATH=$PWD/venv/lib/python3.12/site-packages:$PYTHONPATH

📚 参考资料

资源 链接
黑马 BV131ZuBdEMZ P56-P63 https://www.bilibili.com/video/BV131ZuBdEMZ
faster-whisper https://github.com/SYSTRAN/faster-whisper
MiniMax API https://api.minimaxi.com
MCP 协议 https://modelcontextprotocol.io/
MoveIt2 文档 https://moveit.picknik.ai/main/index.html

📝 License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured