pharos-ai-doc-genie
Enables AI agents to generate real Office documents (.pptx, .docx, .xlsx) and source code from natural language via MCP protocol.
README
Pharos AI Doc Genie β Document Generation Skill
Built for Pharos Skill-to-Agent Dual Cascade Hackathon β Phase 1
A reusable, standardized Skill module that enables any AI Agent in the Pharos ecosystem to generate real Office documents (.pptx, .docx, .xlsx) and source code from natural language β powered by DashScope LLM API.
π― Problem Statement
AI Agents in the Pharos economy need to produce tangible outputs β not just text responses. When an agent helps a user prepare a business proposal, it should deliver a real .docx file. When it analyzes data, it should produce an actual .xlsx spreadsheet. When it creates a presentation, it should output a .pptx that opens in PowerPoint.
Existing solutions either:
- Generate plain text/Markdown that requires manual formatting
- Depend on proprietary cloud APIs with unpredictable availability
- Lack standardized interfaces for agent-to-skill communication
Pharos AI Doc Genie fills this gap with a production-ready, standardized Skill that generates real Office files and code from natural language.
π§© Skill Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Agent (Pharos) β
β (Any MCP-compatible Agent) β
βββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β MCP Protocol (JSON-RPC 2.0)
β stdio transport
βββββββββββββββββββΌβββββββββββββββββββββββββββββ
β Pharos AI Doc Genie Skill β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β generate β β generate β β generate β β
β β _word β β _ppt β β _excel β ... β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β β
β ββββββΌβββββββββββββΌβββββββββββββΌβββββββ β
β β LLM (DashScope qwen) β β
β β Content Generation Layer β β
β ββββββββββββββββββ¬βββββββββββββββββββββ β
β β β
β ββββββββββββββββββΌβββββββββββββββββββββ β
β β Python (python-pptx, etc.) β β
β β File Conversion Layer β β
β β Markdown β real .pptx/.docx β β
β ββββββββββββββββββ¬βββββββββββββββββββββ β
βββββββββββββββββββββΌββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Output Files β
β .pptx .docx β
β .xlsx .py/.js/... β
βββββββββββββββββββββββ
Key design principles:
- Stateless: Each tool call is independent β no session state needed
- Idempotent: Same input produces consistent output structure
- Self-contained: Zero external service dependencies beyond the LLM API
- Standardized: MCP protocol ensures any compatible Agent can call it
π οΈ Tools (4 Skills)
| Tool | Output | Use Case | Model |
|---|---|---|---|
generate_ppt |
.pptx presentation |
Pitch decks, training, reports | qwen3.7-plus |
generate_word |
.docx document |
Proposals, manuals, reports | qwen3.7-plus |
generate_excel |
.xlsx spreadsheet |
Data tables, financials, inventory | qwen3.7-plus |
generate_code |
Source code (.py/.js/.go/...) | Rapid prototyping, boilerplate | qwen-long-latest |
Tool Schema Examples
generate_ppt: Create a professional presentation
{
"name": "generate_ppt",
"arguments": {
"topic": "AI in Enterprise: 2026 Trends",
"requirements": "Executive summary for CTO audience, 12 slides, focus on ROI and adoption metrics",
"slide_count": 12
}
}
generate_excel: Generate structured data
{
"name": "generate_excel",
"arguments": {
"description": "Q2 2026 sales data: Region, Product Category, Revenue, Units Sold, Growth%, Top Salesperson",
"rows": 30
}
}
π Quick Start
Prerequisites
- Node.js >= 18
- Python 3.8+ with
python-pptx,python-docx,openpyxl - DashScope API Key (Alibaba BaiLian)
Install Python dependencies
pip install python-pptx python-docx openpyxl
Run the MCP Server
node src/mcp-server.js
The server listens on stdin/stdout using the MCP stdio transport. Configure your Agent's MCP client to launch this process.
Test with MCP Inspector
npx @modelcontextprotocol/inspector node src/mcp-server.js
Manual Test (JSON-RPC via pipe)
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' | node src/mcp-server.js
π Project Structure
pharos-ai-doc-genie/
βββ src/
β βββ mcp-server.js # MCP stdio server (self-contained)
βββ convert.py # Python: Markdown β .pptx/.docx/.xlsx
βββ output/ # Generated Office files
βββ package.json # Node.js project config
βββ README.md # This file
βββ LICENSE # MIT License
βββ .gitignore
π Integration
With Claude Desktop
{
"mcpServers": {
"pharos-ai-doc-genie": {
"command": "node",
"args": ["/absolute/path/to/pharos-ai-doc-genie/src/mcp-server.js"]
}
}
}
With OpenAI Agents
The server uses standard MCP tool schemas that are directly compatible with OpenAI function calling format. Simply configure your Agent to launch the server as an MCP subprocess.
With Pharos Agents
Pharos Agents can call this Skill via the MCP protocol. Once the Skill is registered, Agents discover it through tools/list and call it through tools/call.
π§ͺ Testing
# List available tools
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | node src/mcp-server.js
# Generate a Word document
echo '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"generate_word","arguments":{"topic":"Project Proposal: AI Chatbot","requirements":"A formal proposal for building an enterprise AI chatbot. Include: executive summary, technical approach, timeline, budget estimate.","length":"medium"}}}' | node src/mcp-server.js
# Generate code
echo '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"generate_code","arguments":{"requirement":"A Python async function that fetches data from a REST API with exponential backoff retry logic","language":"python","comments":"en"}}}' | node src/mcp-server.js
β‘ Performance
| Tool | Avg. Response Time | Max Tokens | File Size |
|---|---|---|---|
| generate_word | ~20-40s | 16384 | 30-50 KB (.docx) |
| generate_ppt | ~30-60s | 16384 | 25-40 KB (.pptx) |
| generate_excel | ~15-25s | 16384 | 5-15 KB (.xlsx) |
| generate_code | ~15-30s | 16384 | N/A (text) |
π Security
- No API key exposure: The DashScope API key is server-side only and never sent to Agents
- Input validation: All Agent inputs are validated before processing
- Output isolation: Generated files are written to a dedicated output directory
- No persistent state: Each tool call is isolated with no cross-call data leakage
πΊοΈ Roadmap
Phase 2 (Agent Arena)
- [ ] Deploy as a persistent Skill on Pharos chain
- [ ] On-chain billing per document generation
- [ ] NFT-based document ownership and verification
- [ ] Multi-agent collaborative document editing
Beyond
- [ ] PDF generation and manipulation
- [ ] Image-to-document conversion (OCR β formatted docx)
- [ ] Multi-language document templates
- [ ] Real-time collaborative editing via WebSocket
π€ Author
huimingchen081-beep (GitHub)
Built for the Pharos Skill-to-Agent Dual Cascade Hackathon β Phase 1 (Skill Hackathon).
π License
MIT License β see LICENSE for details.
π Acknowledgments
- Pharos Network β for building the AI Agent economy infrastructure
- DashScope (Alibaba BaiLian) β for the LLM API powering content generation
- Model Context Protocol (Anthropic) β for the standardized agent-skill communication protocol
- python-pptx / python-docx / openpyxl β for Office file generation
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.