V2W MCP Server
Enables agents to convert videos from URLs, Bilibili, and netdisk shares into Word documents, with tools for transcription, job management, and account configuration.
README
V2W - Video to Word
V2W is a self-hosted workspace for turning videos into Word documents. It supports batch transcription from public media URLs, video pages, Baidu Netdisk shares, and Quark Netdisk shares, then generates .docx outputs for transcripts and prompt-based documents such as outlines, Q&A notes, summaries, mind maps, or rewritten drafts.
The project is designed for small teams that need repeatable video-to-document workflows on their own server, with account-based model settings, reusable prompt templates, usage tracking, retryable jobs, and a native MCP endpoint for agent integrations such as OpenClaw.
Current version: 0.1.9
Screenshot

Features
- Batch submission from multiple links.
- Public HTTP/HTTPS media transcription.
- Bilibili and generic video-page parsing through
yt-dlp. - Baidu Netdisk share processing through
BaiduPCS-Go. - Baidu Netdisk QR-code login and manual credential authorization.
- Quark Netdisk share processing through user-provided cookies.
- Original transcript
.docxoutput. - Extra
.docxfiles generated from reusable prompts. - Built-in templates for
提炼版and思维导图. - Per-account model configuration and prompt templates.
- Retry failed jobs or only failed extra document generation.
- Batch download for generated Word files.
- Account login, admin user management, and usage records.
- Usage tracking for ASR duration, AI tokens, and estimated cost.
- SQLite persistence for single-server deployments.
- Native HTTP MCP endpoint for agent workflows.
Tech Stack
- Frontend: Vite + React
- Backend: Node.js + Express
- Database: SQLite with
better-sqlite3 - Word generation:
docx - ZIP packaging:
archiver - Media tools:
ffmpeg,ffprobe - Video page downloader:
yt-dlp - Baidu Netdisk downloader:
BaiduPCS-Go - Default ASR provider: Alibaba Cloud Model Studio Paraformer
- Extra document generation: OpenAI-compatible Chat Completions API
Requirements
- Node.js 20+
- npm
ffmpegandffprobeyt-dlpBaiduPCS-Gofor Baidu Netdisk links- Chrome or Chromium for Baidu QR-code login
Public direct links can work without BaiduPCS-Go. Netdisk links require the corresponding netdisk authorization.
Quick Start
git clone https://github.com/joyrayai/v2w.git
cd v2w
npm run setup
npm run dev
Open the web app and create the first administrator account when prompted. After initialization, log in and configure your model provider before submitting tasks.
Default local URLs:
- Web:
http://localhost:5173 - API:
http://localhost:5174
If you want the setup script to try installing system tools:
npm run setup -- --install-system
To only check the environment:
npm run doctor
Agent / OpenClaw Quick Test
After starting the API server, the MCP endpoint is available at:
http://localhost:5174/mcp
For OpenClaw running in Docker on the same machine, register V2W with:
openclaw mcp add v2w-local \
--transport streamable-http \
--url http://host.docker.internal:5174/mcp
Then verify tool discovery:
openclaw mcp probe v2w-local --json
V2W should expose 33 MCP tools in version 0.1.9.
Manual Setup
npm install
cp .env.example .env
npm run dev
Build for production:
npm run build
npm start
Configuration
Copy .env.example to .env before running the app.
cp .env.example .env
Common environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
5174 |
Backend server port |
PUBLIC_BASE_URL |
http://localhost:5174 |
Public base URL used for temporary media URLs |
SESSION_SECRET |
development fallback | Secret for signed login tokens |
MAX_CONCURRENCY |
5 |
Global running task limit |
MAX_USER_RUNNING |
2 |
Running task limit per user |
MAX_USER_QUEUED |
50 |
Queued task limit per user |
MIN_FREE_DISK_GB |
6 |
Stop starting new tasks when free disk is below this value |
CHROME_PATH |
empty | Optional Chrome path for QR-code login |
CHROMIUM_PATH |
empty | Optional Chromium path for QR-code login |
Do not commit real .env files, API keys, cookies, SQLite databases, or generated documents.
Model Settings
Model API keys and model names are configured in the web app after login.
The default provider preset uses Alibaba Cloud Model Studio:
- ASR model:
paraformer-v2 - AI model: configurable OpenAI-compatible chat model
Other OpenAI-compatible providers can be used for extra document generation by setting the base URL, API key, and model name in the model configuration page.
Netdisk Authorization
Baidu Netdisk
Baidu Netdisk support depends on BaiduPCS-Go.
You can authorize Baidu Netdisk in the web app by:
- QR-code login, if Chrome or Chromium is available on the server.
- Manual credential login, by providing cookies or BDUSS/STOKEN values.
Each app account keeps an independent netdisk authorization state.
Quark Netdisk
Quark Netdisk support uses cookies copied from a logged-in Quark web session. Paste the cookies in the netdisk authorization card before submitting Quark share links.
MCP Integration
V2W exposes a native MCP-compatible HTTP endpoint after deployment:
POST /mcp
For a local development server:
http://localhost:5174/mcp
Implemented MCP methods:
initializetools/listtools/call
Available tools:
| Tool | Description |
|---|---|
v2w.setup.status |
Check initialization state and local tool availability |
v2w.setup.create_admin |
Create the first administrator account before any account exists |
v2w.account.register |
Create a password account and return an authToken |
v2w.service_info |
Read service status, runtime limits and queue status |
v2w.mcp.capabilities |
Read grouped MCP capabilities for agent planning |
v2w.mcp.self_check |
Run an authenticated MCP integration self-check |
v2w.login |
Log in with a V2W account and return an authToken |
v2w.config.get |
Read the current account model configuration with secrets redacted |
v2w.config.save |
Save model and optional OSS configuration for the account |
v2w.config.test |
Test saved or supplied OpenAI-compatible model configuration |
v2w.usage.pricing |
Read the local ASR and AI pricing table used for estimates |
v2w.usage.summary |
Read current-account usage summary |
v2w.usage.records |
List current-account usage records |
v2w.admin.users |
Admin only: list users with job counts and usage summary |
v2w.admin.usage.summary |
Admin only: read global usage summary |
v2w.admin.usage.records |
Admin only: list global usage records |
v2w.netdisk.status |
Read Baidu or Quark authorization status |
v2w.netdisk.login |
Authorize Baidu or Quark with copied browser cookies; Baidu also supports BDUSS |
v2w.baidu_qr.start |
Start Baidu Netdisk QR authorization |
v2w.baidu_qr.status |
Poll Baidu Netdisk QR authorization status |
v2w.baidu_qr.cancel |
Cancel a Baidu Netdisk QR authorization session |
v2w.templates.list |
List extra document templates, including default templates |
v2w.templates.get |
Read one extra document template |
v2w.templates.create |
Create an extra document template |
v2w.templates.update |
Update an extra document template |
v2w.templates.delete |
Delete an extra document template |
v2w.jobs.submit |
Submit direct, page, Baidu Netdisk or Quark Netdisk links as jobs |
v2w.jobs.list |
List jobs for the current account |
v2w.jobs.get |
Read one job and its current progress |
v2w.jobs.retry |
Retry a failed job, or retry only failed extra documents when possible |
v2w.jobs.retry_extra |
Retry only failed extra documents from cached transcript text |
v2w.jobs.delete |
Delete a non-running job and its files |
v2w.jobs.downloads |
Return generated document download URLs and a batch ZIP URL |
Authentication flow:
- Call
v2w.setup.statusafter deployment. - Call
v2w.mcp.capabilitiesif the agent needs a grouped capability map. - If
needsAdministrue, callv2w.setup.create_admin. - Otherwise call
v2w.loginwithusernameandpassword, or create a user withv2w.account.register. - Pass the returned
authTokenin later tool arguments. - Call
v2w.mcp.self_checkto verify account model configuration, netdisk authorization and job state. - Alternatively, pass the token as
Authorization: Bearer <token>.
Example JSON-RPC call:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "v2w.login",
"arguments": {
"username": "admin",
"password": "your-password"
}
}
}
Baidu QR authorization returns qrImageDataUrl when the QR image is ready. Agents can render that data URL directly for users to scan with the Baidu Netdisk app. qrImageUrl is also returned for clients that can call the protected V2W HTTP API with authentication.
Task workflow over MCP:
- Call
v2w.login. - Call
v2w.config.get; if no config exists, callv2w.config.save. - Call
v2w.config.testto verify the AI processing model before submitting work. - For Baidu Netdisk links, call
v2w.netdisk.status; if needed, usev2w.baidu_qr.startand pollv2w.baidu_qr.status. Usev2w.baidu_qr.cancelif the user abandons the QR login. - Call
v2w.jobs.submitwithlinksand optionalextraPrompts. - Poll
v2w.jobs.listorv2w.jobs.get. - Call
v2w.jobs.downloadsafter completion.
v2w.jobs.submit always uses the model configuration saved on the V2W account. Agents may pass runtime-only options such as concurrency, directUrlMode, or publicBaseUrl, but should not pass model secrets in job calls.
Template workflow:
- Call
v2w.templates.listto ensure the built-in提炼版and思维导图templates exist for the account. - Call
v2w.templates.createorv2w.templates.updatewhen an agent needs to save reusable prompts for extra Word files. - Pass selected template titles and prompts as
extraPromptswhen callingv2w.jobs.submit.
Usage and admin workflow:
- Call
v2w.usage.summaryafter job completion to report ASR seconds, AI tokens, and estimated cost for the current account. - Call
v2w.usage.recordswhen an agent needs itemized records for a report. - Call
v2w.usage.pricingto explain how local cost estimates are calculated. - Admin accounts can call
v2w.admin.users,v2w.admin.usage.summary, andv2w.admin.usage.recordsfor organization-level reporting.
Manual netdisk authorization:
- Baidu: call
v2w.netdisk.loginwith{ "provider": "baidu", "mode": "cookies", "cookies": "BDUSS=...; STOKEN=..." }, or with{ "provider": "baidu", "mode": "bduss", "bduss": "...", "stoken": "..." }. - Quark: call
v2w.netdisk.loginwith{ "provider": "quark", "mode": "cookies", "cookies": "__pus=...; __puus=..." }.
MCP responses redact known credential fields from command output. Clients should still avoid logging raw cookies or tokens.
Runtime Data
Runtime files are stored under data/:
data/
├── app.sqlite
├── downloads/
├── audio/
├── outputs/
└── netdisk-users/
data/ is ignored by Git. Back it up separately if you need to preserve users, tasks, templates, usage records, or generated documents.
Supported Link Types
- Public direct media links, such as
.mp4,.mov,.m4a,.mp3. - Bilibili video page links.
- Other video pages supported by
yt-dlp. - Baidu Netdisk share links.
- Quark Netdisk share links.
Unsupported netdisk providers will be rejected with a clear error message.
Usage Notes
- The app is built for single-server deployment.
- Running tasks are processed by the Node.js process and stored in SQLite.
- If the process restarts, queued tasks can continue, while interrupted running tasks may need retry.
- Large files require enough local disk space for temporary download and audio extraction.
- Netdisk cookies can expire and may need re-authorization.
- Estimated cost is calculated from local pricing config and may differ from the final provider bill.
Useful Commands
npm run dev # Start frontend and backend in development mode
npm run build # Build frontend
npm start # Start backend in production mode
npm run setup # Install dependencies and prepare local environment
npm run doctor # Check environment
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.