V2W MCP Server

V2W MCP Server

Enables agents to convert videos from URLs, Bilibili, and netdisk shares into Word documents, with tools for transcription, job management, and account configuration.

Category
Visit Server

README

V2W - Video to Word

GitHub · Issues

V2W is a self-hosted workspace for turning videos into Word documents. It supports batch transcription from public media URLs, video pages, Baidu Netdisk shares, and Quark Netdisk shares, then generates .docx outputs for transcripts and prompt-based documents such as outlines, Q&A notes, summaries, mind maps, or rewritten drafts.

The project is designed for small teams that need repeatable video-to-document workflows on their own server, with account-based model settings, reusable prompt templates, usage tracking, retryable jobs, and a native MCP endpoint for agent integrations such as OpenClaw.

Current version: 0.1.9

Screenshot

V2W web app

Features

  • Batch submission from multiple links.
  • Public HTTP/HTTPS media transcription.
  • Bilibili and generic video-page parsing through yt-dlp.
  • Baidu Netdisk share processing through BaiduPCS-Go.
  • Baidu Netdisk QR-code login and manual credential authorization.
  • Quark Netdisk share processing through user-provided cookies.
  • Original transcript .docx output.
  • Extra .docx files generated from reusable prompts.
  • Built-in templates for 提炼版 and 思维导图.
  • Per-account model configuration and prompt templates.
  • Retry failed jobs or only failed extra document generation.
  • Batch download for generated Word files.
  • Account login, admin user management, and usage records.
  • Usage tracking for ASR duration, AI tokens, and estimated cost.
  • SQLite persistence for single-server deployments.
  • Native HTTP MCP endpoint for agent workflows.

Tech Stack

  • Frontend: Vite + React
  • Backend: Node.js + Express
  • Database: SQLite with better-sqlite3
  • Word generation: docx
  • ZIP packaging: archiver
  • Media tools: ffmpeg, ffprobe
  • Video page downloader: yt-dlp
  • Baidu Netdisk downloader: BaiduPCS-Go
  • Default ASR provider: Alibaba Cloud Model Studio Paraformer
  • Extra document generation: OpenAI-compatible Chat Completions API

Requirements

  • Node.js 20+
  • npm
  • ffmpeg and ffprobe
  • yt-dlp
  • BaiduPCS-Go for Baidu Netdisk links
  • Chrome or Chromium for Baidu QR-code login

Public direct links can work without BaiduPCS-Go. Netdisk links require the corresponding netdisk authorization.

Quick Start

git clone https://github.com/joyrayai/v2w.git
cd v2w
npm run setup
npm run dev

Open the web app and create the first administrator account when prompted. After initialization, log in and configure your model provider before submitting tasks.

Default local URLs:

  • Web: http://localhost:5173
  • API: http://localhost:5174

If you want the setup script to try installing system tools:

npm run setup -- --install-system

To only check the environment:

npm run doctor

Agent / OpenClaw Quick Test

After starting the API server, the MCP endpoint is available at:

http://localhost:5174/mcp

For OpenClaw running in Docker on the same machine, register V2W with:

openclaw mcp add v2w-local \
  --transport streamable-http \
  --url http://host.docker.internal:5174/mcp

Then verify tool discovery:

openclaw mcp probe v2w-local --json

V2W should expose 33 MCP tools in version 0.1.9.

Manual Setup

npm install
cp .env.example .env
npm run dev

Build for production:

npm run build
npm start

Configuration

Copy .env.example to .env before running the app.

cp .env.example .env

Common environment variables:

Variable Default Description
PORT 5174 Backend server port
PUBLIC_BASE_URL http://localhost:5174 Public base URL used for temporary media URLs
SESSION_SECRET development fallback Secret for signed login tokens
MAX_CONCURRENCY 5 Global running task limit
MAX_USER_RUNNING 2 Running task limit per user
MAX_USER_QUEUED 50 Queued task limit per user
MIN_FREE_DISK_GB 6 Stop starting new tasks when free disk is below this value
CHROME_PATH empty Optional Chrome path for QR-code login
CHROMIUM_PATH empty Optional Chromium path for QR-code login

Do not commit real .env files, API keys, cookies, SQLite databases, or generated documents.

Model Settings

Model API keys and model names are configured in the web app after login.

The default provider preset uses Alibaba Cloud Model Studio:

  • ASR model: paraformer-v2
  • AI model: configurable OpenAI-compatible chat model

Other OpenAI-compatible providers can be used for extra document generation by setting the base URL, API key, and model name in the model configuration page.

Netdisk Authorization

Baidu Netdisk

Baidu Netdisk support depends on BaiduPCS-Go.

You can authorize Baidu Netdisk in the web app by:

  • QR-code login, if Chrome or Chromium is available on the server.
  • Manual credential login, by providing cookies or BDUSS/STOKEN values.

Each app account keeps an independent netdisk authorization state.

Quark Netdisk

Quark Netdisk support uses cookies copied from a logged-in Quark web session. Paste the cookies in the netdisk authorization card before submitting Quark share links.

MCP Integration

V2W exposes a native MCP-compatible HTTP endpoint after deployment:

POST /mcp

For a local development server:

http://localhost:5174/mcp

Implemented MCP methods:

  • initialize
  • tools/list
  • tools/call

Available tools:

Tool Description
v2w.setup.status Check initialization state and local tool availability
v2w.setup.create_admin Create the first administrator account before any account exists
v2w.account.register Create a password account and return an authToken
v2w.service_info Read service status, runtime limits and queue status
v2w.mcp.capabilities Read grouped MCP capabilities for agent planning
v2w.mcp.self_check Run an authenticated MCP integration self-check
v2w.login Log in with a V2W account and return an authToken
v2w.config.get Read the current account model configuration with secrets redacted
v2w.config.save Save model and optional OSS configuration for the account
v2w.config.test Test saved or supplied OpenAI-compatible model configuration
v2w.usage.pricing Read the local ASR and AI pricing table used for estimates
v2w.usage.summary Read current-account usage summary
v2w.usage.records List current-account usage records
v2w.admin.users Admin only: list users with job counts and usage summary
v2w.admin.usage.summary Admin only: read global usage summary
v2w.admin.usage.records Admin only: list global usage records
v2w.netdisk.status Read Baidu or Quark authorization status
v2w.netdisk.login Authorize Baidu or Quark with copied browser cookies; Baidu also supports BDUSS
v2w.baidu_qr.start Start Baidu Netdisk QR authorization
v2w.baidu_qr.status Poll Baidu Netdisk QR authorization status
v2w.baidu_qr.cancel Cancel a Baidu Netdisk QR authorization session
v2w.templates.list List extra document templates, including default templates
v2w.templates.get Read one extra document template
v2w.templates.create Create an extra document template
v2w.templates.update Update an extra document template
v2w.templates.delete Delete an extra document template
v2w.jobs.submit Submit direct, page, Baidu Netdisk or Quark Netdisk links as jobs
v2w.jobs.list List jobs for the current account
v2w.jobs.get Read one job and its current progress
v2w.jobs.retry Retry a failed job, or retry only failed extra documents when possible
v2w.jobs.retry_extra Retry only failed extra documents from cached transcript text
v2w.jobs.delete Delete a non-running job and its files
v2w.jobs.downloads Return generated document download URLs and a batch ZIP URL

Authentication flow:

  1. Call v2w.setup.status after deployment.
  2. Call v2w.mcp.capabilities if the agent needs a grouped capability map.
  3. If needsAdmin is true, call v2w.setup.create_admin.
  4. Otherwise call v2w.login with username and password, or create a user with v2w.account.register.
  5. Pass the returned authToken in later tool arguments.
  6. Call v2w.mcp.self_check to verify account model configuration, netdisk authorization and job state.
  7. Alternatively, pass the token as Authorization: Bearer <token>.

Example JSON-RPC call:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "v2w.login",
    "arguments": {
      "username": "admin",
      "password": "your-password"
    }
  }
}

Baidu QR authorization returns qrImageDataUrl when the QR image is ready. Agents can render that data URL directly for users to scan with the Baidu Netdisk app. qrImageUrl is also returned for clients that can call the protected V2W HTTP API with authentication.

Task workflow over MCP:

  1. Call v2w.login.
  2. Call v2w.config.get; if no config exists, call v2w.config.save.
  3. Call v2w.config.test to verify the AI processing model before submitting work.
  4. For Baidu Netdisk links, call v2w.netdisk.status; if needed, use v2w.baidu_qr.start and poll v2w.baidu_qr.status. Use v2w.baidu_qr.cancel if the user abandons the QR login.
  5. Call v2w.jobs.submit with links and optional extraPrompts.
  6. Poll v2w.jobs.list or v2w.jobs.get.
  7. Call v2w.jobs.downloads after completion.

v2w.jobs.submit always uses the model configuration saved on the V2W account. Agents may pass runtime-only options such as concurrency, directUrlMode, or publicBaseUrl, but should not pass model secrets in job calls.

Template workflow:

  • Call v2w.templates.list to ensure the built-in 提炼版 and 思维导图 templates exist for the account.
  • Call v2w.templates.create or v2w.templates.update when an agent needs to save reusable prompts for extra Word files.
  • Pass selected template titles and prompts as extraPrompts when calling v2w.jobs.submit.

Usage and admin workflow:

  • Call v2w.usage.summary after job completion to report ASR seconds, AI tokens, and estimated cost for the current account.
  • Call v2w.usage.records when an agent needs itemized records for a report.
  • Call v2w.usage.pricing to explain how local cost estimates are calculated.
  • Admin accounts can call v2w.admin.users, v2w.admin.usage.summary, and v2w.admin.usage.records for organization-level reporting.

Manual netdisk authorization:

  • Baidu: call v2w.netdisk.login with { "provider": "baidu", "mode": "cookies", "cookies": "BDUSS=...; STOKEN=..." }, or with { "provider": "baidu", "mode": "bduss", "bduss": "...", "stoken": "..." }.
  • Quark: call v2w.netdisk.login with { "provider": "quark", "mode": "cookies", "cookies": "__pus=...; __puus=..." }.

MCP responses redact known credential fields from command output. Clients should still avoid logging raw cookies or tokens.

Runtime Data

Runtime files are stored under data/:

data/
├── app.sqlite
├── downloads/
├── audio/
├── outputs/
└── netdisk-users/

data/ is ignored by Git. Back it up separately if you need to preserve users, tasks, templates, usage records, or generated documents.

Supported Link Types

  • Public direct media links, such as .mp4, .mov, .m4a, .mp3.
  • Bilibili video page links.
  • Other video pages supported by yt-dlp.
  • Baidu Netdisk share links.
  • Quark Netdisk share links.

Unsupported netdisk providers will be rejected with a clear error message.

Usage Notes

  • The app is built for single-server deployment.
  • Running tasks are processed by the Node.js process and stored in SQLite.
  • If the process restarts, queued tasks can continue, while interrupted running tasks may need retry.
  • Large files require enough local disk space for temporary download and audio extraction.
  • Netdisk cookies can expire and may need re-authorization.
  • Estimated cost is calculated from local pricing config and may differ from the final provider bill.

Useful Commands

npm run dev       # Start frontend and backend in development mode
npm run build     # Build frontend
npm start         # Start backend in production mode
npm run setup     # Install dependencies and prepare local environment
npm run doctor    # Check environment

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured