Voice Assistant MCP Demo
A cloud backend for a voice assistant that processes natural language commands, manages tasks, and provides speech responses. Integrates with MCP protocol and Gemini API for Vietnamese language support.
README
Voice Assistant MCP Demo — Phase 1: Cloud Backend
Ba service Python chạy độc lập, kết nối qua URL env vars.
api_server → mcp_server → gateway ← Android app / curl
Chạy local (test Phase 1)
pip install -r requirements.txt
cp .env.example .env # điền GEMINI_API_KEY và GATEWAY_SHARED_TOKEN
# Terminal 1
python api_server.py # :8000
# Terminal 2
python mcp_server.py # :8001 (MCP_BASE_URL=http://localhost:8000 đã có trong .env)
# Terminal 3
python gateway.py # :8002
Test bằng curl:
# Không có auth token (để trống GATEWAY_SHARED_TOKEN trong .env khi test local)
curl -X POST http://localhost:8002/command \
-H "Content-Type: application/json" \
-d '{"text": "việc hôm nay có gì", "session_id": "test-1"}'
# Kết quả mong đợi:
# {"speech": "Hôm nay bạn có 4 việc, trong đó 3 việc chưa xong...", "session_id": "test-1"}
# Multi-turn — follow-up cùng session
curl -X POST http://localhost:8002/command \
-H "Content-Type: application/json" \
-d '{"text": "đánh dấu xong việc gọi khách hàng", "session_id": "test-1"}'
Deploy lên Render (3 service riêng)
Bước 1 — Push repo lên GitHub
git init && git add . && git commit -m "init phase 1"
git remote add origin <your-github-repo-url>
git push -u origin main
Bước 2 — Tạo Service 1: API Server
- Render Dashboard → New → Web Service → chọn repo
- Name:
task-api - Build Command:
pip install -r requirements.txt - Start Command:
python api_server.py - Environment Variables:
PORT→ để Render tự điền
Sau khi deploy xong, copy URL dạng https://task-api-xxxx.onrender.com.
Bước 3 — Tạo Service 2: MCP Server
- Name:
task-mcp - Start Command:
python mcp_server.py - Environment Variables:
API_BASE_URL→ URL của Service 1 (bước 2)
Sau khi deploy, copy URL: https://task-mcp-xxxx.onrender.com.
Bước 4 — Tạo Service 3: Gateway
- Name:
task-gateway - Start Command:
python gateway.py - Environment Variables:
MCP_BASE_URL→ URL của Service 2 (bước 3)GEMINI_API_KEY→ key từ Google AI StudioGATEWAY_SHARED_TOKEN→ chuỗi ngẫu nhiên (dùngopenssl rand -hex 16)
Bước 5 — Test end-to-end trên cloud
GATEWAY_URL=https://task-gateway-xxxx.onrender.com
TOKEN=your_shared_token
curl -X POST $GATEWAY_URL/command \
-H "Content-Type: application/json" \
-H "X-Auth-Token: $TOKEN" \
-d '{"text": "việc hôm nay có gì", "session_id": "s1"}'
Lưu ý cold-start: Render free tier ngủ sau 15 phút idle. Request đầu tiên mất ~30s. Chấp nhận được cho demo.
Biến môi trường tổng hợp
| Service | Biến | Bắt buộc | Ghi chú |
|---|---|---|---|
| api_server | PORT |
auto | Render tự set |
| mcp_server | PORT |
auto | |
| mcp_server | API_BASE_URL |
✓ | URL của api_server |
| gateway | PORT |
auto | |
| gateway | MCP_BASE_URL |
✓ | URL của mcp_server |
| gateway | GEMINI_API_KEY |
✓ | Google AI Studio |
| gateway | GATEWAY_SHARED_TOKEN |
khuyến nghị | Bảo vệ endpoint public |
Cấu trúc request/response gateway
POST /command
Headers: X-Auth-Token: <token> (nếu có GATEWAY_SHARED_TOKEN)
Body:
{
"text": "lệnh giọng nói đã chuyển thành text",
"session_id": "uuid-từ-app"
}
Response:
{
"speech": "Câu trả lời tiếng Việt tự nhiên để đọc lên",
"session_id": "uuid-từ-app"
}
Session tự reset sau 30 phút không có request.
Mốc Phase 1 ✓
curl POST /command {"text": "việc hôm nay có gì"}
→ {"speech": "Hôm nay bạn có ..."}
Phase 2 — Android App (Push-to-Talk)
Cấu trúc
android/
├── app/src/main/java/com/demo/voiceassistant/
│ ├── Config.kt ← URL gateway + auth token
│ ├── MainActivity.kt ← 1 nút "Nói", quản lý luồng
│ ├── SpeechController.kt ← STT vi-VN (SpeechRecognizer)
│ ├── ApiClient.kt ← POST /command qua OkHttp
│ └── SpeakController.kt ← TTS đọc kết quả
└── app/src/main/res/
└── layout/activity_main.xml
Mở trong Android Studio
- File → Open → chọn thư mục
android/ - Android Studio tự tải Gradle và sync dependencies
- Cắm máy thật (hoặc bật AVD) → Run
Nếu dùng máy ảo (AVD): chọn image có Google Play để
SpeechRecognizerhoạt động với vi-VN.
Luồng hoạt động Phase 2
Bấm nút "Nói"
→ SpeechRecognizer (vi-VN) nghe ~5 giây
→ text → POST https://task-gateway-0le7.onrender.com/command
header: X-Auth-Token: <token>
→ {"speech": "..."} → TextToSpeech đọc lên
Lưu ý cold-start: Request đầu tiên sau 15 phút idle mất ~30s (Render free tier). Timeout của ApiClient đã set 90s.
Đổi gateway / token
Sửa android/app/src/main/java/com/demo/voiceassistant/Config.kt:
object Config {
const val GATEWAY_URL = "https://task-gateway-xxxx.onrender.com"
const val AUTH_TOKEN = "your_token_here"
}
Mốc Phase 2 ✓
Bấm nút → nói "việc hôm nay có gì" → app đọc lại danh sách bằng giọng tiếng Việt.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.