Voice Assistant MCP Demo

Voice Assistant MCP Demo

A cloud backend for a voice assistant that processes natural language commands, manages tasks, and provides speech responses. Integrates with MCP protocol and Gemini API for Vietnamese language support.

Category
Visit Server

README

Voice Assistant MCP Demo — Phase 1: Cloud Backend

Ba service Python chạy độc lập, kết nối qua URL env vars.

api_server  →  mcp_server  →  gateway  ←  Android app / curl

Chạy local (test Phase 1)

pip install -r requirements.txt
cp .env.example .env        # điền GEMINI_API_KEY và GATEWAY_SHARED_TOKEN

# Terminal 1
python api_server.py        # :8000

# Terminal 2
python mcp_server.py        # :8001  (MCP_BASE_URL=http://localhost:8000 đã có trong .env)

# Terminal 3
python gateway.py           # :8002

Test bằng curl:

# Không có auth token (để trống GATEWAY_SHARED_TOKEN trong .env khi test local)
curl -X POST http://localhost:8002/command \
  -H "Content-Type: application/json" \
  -d '{"text": "việc hôm nay có gì", "session_id": "test-1"}'

# Kết quả mong đợi:
# {"speech": "Hôm nay bạn có 4 việc, trong đó 3 việc chưa xong...", "session_id": "test-1"}

# Multi-turn — follow-up cùng session
curl -X POST http://localhost:8002/command \
  -H "Content-Type: application/json" \
  -d '{"text": "đánh dấu xong việc gọi khách hàng", "session_id": "test-1"}'

Deploy lên Render (3 service riêng)

Bước 1 — Push repo lên GitHub

git init && git add . && git commit -m "init phase 1"
git remote add origin <your-github-repo-url>
git push -u origin main

Bước 2 — Tạo Service 1: API Server

  • Render Dashboard → New → Web Service → chọn repo
  • Name: task-api
  • Build Command: pip install -r requirements.txt
  • Start Command: python api_server.py
  • Environment Variables:
    • PORT → để Render tự điền

Sau khi deploy xong, copy URL dạng https://task-api-xxxx.onrender.com.

Bước 3 — Tạo Service 2: MCP Server

  • Name: task-mcp
  • Start Command: python mcp_server.py
  • Environment Variables:
    • API_BASE_URL → URL của Service 1 (bước 2)

Sau khi deploy, copy URL: https://task-mcp-xxxx.onrender.com.

Bước 4 — Tạo Service 3: Gateway

  • Name: task-gateway
  • Start Command: python gateway.py
  • Environment Variables:
    • MCP_BASE_URL → URL của Service 2 (bước 3)
    • GEMINI_API_KEY → key từ Google AI Studio
    • GATEWAY_SHARED_TOKEN → chuỗi ngẫu nhiên (dùng openssl rand -hex 16)

Bước 5 — Test end-to-end trên cloud

GATEWAY_URL=https://task-gateway-xxxx.onrender.com
TOKEN=your_shared_token

curl -X POST $GATEWAY_URL/command \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: $TOKEN" \
  -d '{"text": "việc hôm nay có gì", "session_id": "s1"}'

Lưu ý cold-start: Render free tier ngủ sau 15 phút idle. Request đầu tiên mất ~30s. Chấp nhận được cho demo.


Biến môi trường tổng hợp

Service Biến Bắt buộc Ghi chú
api_server PORT auto Render tự set
mcp_server PORT auto
mcp_server API_BASE_URL URL của api_server
gateway PORT auto
gateway MCP_BASE_URL URL của mcp_server
gateway GEMINI_API_KEY Google AI Studio
gateway GATEWAY_SHARED_TOKEN khuyến nghị Bảo vệ endpoint public

Cấu trúc request/response gateway

POST /command

Headers: X-Auth-Token: <token> (nếu có GATEWAY_SHARED_TOKEN)

Body:

{
  "text": "lệnh giọng nói đã chuyển thành text",
  "session_id": "uuid-từ-app"
}

Response:

{
  "speech": "Câu trả lời tiếng Việt tự nhiên để đọc lên",
  "session_id": "uuid-từ-app"
}

Session tự reset sau 30 phút không có request.


Mốc Phase 1 ✓

curl POST /command {"text": "việc hôm nay có gì"}
→ {"speech": "Hôm nay bạn có ..."}

Phase 2 — Android App (Push-to-Talk)

Cấu trúc

android/
├── app/src/main/java/com/demo/voiceassistant/
│   ├── Config.kt           ← URL gateway + auth token
│   ├── MainActivity.kt     ← 1 nút "Nói", quản lý luồng
│   ├── SpeechController.kt ← STT vi-VN (SpeechRecognizer)
│   ├── ApiClient.kt        ← POST /command qua OkHttp
│   └── SpeakController.kt  ← TTS đọc kết quả
└── app/src/main/res/
    └── layout/activity_main.xml

Mở trong Android Studio

  1. File → Open → chọn thư mục android/
  2. Android Studio tự tải Gradle và sync dependencies
  3. Cắm máy thật (hoặc bật AVD) → Run

Nếu dùng máy ảo (AVD): chọn image có Google Play để SpeechRecognizer hoạt động với vi-VN.

Luồng hoạt động Phase 2

Bấm nút "Nói"
  → SpeechRecognizer (vi-VN) nghe ~5 giây
  → text → POST https://task-gateway-0le7.onrender.com/command
             header: X-Auth-Token: <token>
  → {"speech": "..."} → TextToSpeech đọc lên

Lưu ý cold-start: Request đầu tiên sau 15 phút idle mất ~30s (Render free tier). Timeout của ApiClient đã set 90s.

Đổi gateway / token

Sửa android/app/src/main/java/com/demo/voiceassistant/Config.kt:

object Config {
    const val GATEWAY_URL = "https://task-gateway-xxxx.onrender.com"
    const val AUTH_TOKEN  = "your_token_here"
}

Mốc Phase 2 ✓

Bấm nút → nói "việc hôm nay có gì" → app đọc lại danh sách bằng giọng tiếng Việt.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured