pharos-ai-doc-genie

pharos-ai-doc-genie

Enables AI agents to generate real Office documents (.pptx, .docx, .xlsx) and source code from natural language via MCP protocol.

Category
Visit Server

README

Pharos AI Doc Genie β€” Document Generation Skill

Built for Pharos Skill-to-Agent Dual Cascade Hackathon β€” Phase 1

A reusable, standardized Skill module that enables any AI Agent in the Pharos ecosystem to generate real Office documents (.pptx, .docx, .xlsx) and source code from natural language β€” powered by DashScope LLM API.

License: MIT Node.js MCP


🎯 Problem Statement

AI Agents in the Pharos economy need to produce tangible outputs β€” not just text responses. When an agent helps a user prepare a business proposal, it should deliver a real .docx file. When it analyzes data, it should produce an actual .xlsx spreadsheet. When it creates a presentation, it should output a .pptx that opens in PowerPoint.

Existing solutions either:

  • Generate plain text/Markdown that requires manual formatting
  • Depend on proprietary cloud APIs with unpredictable availability
  • Lack standardized interfaces for agent-to-skill communication

Pharos AI Doc Genie fills this gap with a production-ready, standardized Skill that generates real Office files and code from natural language.


🧩 Skill Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                AI Agent (Pharos)              β”‚
β”‚         (Any MCP-compatible Agent)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ MCP Protocol (JSON-RPC 2.0)
                  β”‚ stdio transport
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          Pharos AI Doc Genie Skill            β”‚
β”‚                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚ generate β”‚ β”‚ generate β”‚ β”‚ generate β”‚      β”‚
β”‚  β”‚ _word    β”‚ β”‚  _ppt    β”‚ β”‚ _excel   β”‚ ...  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜      β”‚
β”‚       β”‚            β”‚            β”‚             β”‚
β”‚  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚        LLM (DashScope qwen)         β”‚      β”‚
β”‚  β”‚    Content Generation Layer         β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                   β”‚                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚     Python (python-pptx, etc.)      β”‚      β”‚
β”‚  β”‚     File Conversion Layer           β”‚      β”‚
β”‚  β”‚     Markdown β†’ real .pptx/.docx     β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚   Output Files      β”‚
         β”‚  .pptx  .docx       β”‚
         β”‚  .xlsx  .py/.js/... β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key design principles:

  • Stateless: Each tool call is independent β€” no session state needed
  • Idempotent: Same input produces consistent output structure
  • Self-contained: Zero external service dependencies beyond the LLM API
  • Standardized: MCP protocol ensures any compatible Agent can call it

πŸ› οΈ Tools (4 Skills)

Tool Output Use Case Model
generate_ppt .pptx presentation Pitch decks, training, reports qwen3.7-plus
generate_word .docx document Proposals, manuals, reports qwen3.7-plus
generate_excel .xlsx spreadsheet Data tables, financials, inventory qwen3.7-plus
generate_code Source code (.py/.js/.go/...) Rapid prototyping, boilerplate qwen-long-latest

Tool Schema Examples

generate_ppt: Create a professional presentation

{
  "name": "generate_ppt",
  "arguments": {
    "topic": "AI in Enterprise: 2026 Trends",
    "requirements": "Executive summary for CTO audience, 12 slides, focus on ROI and adoption metrics",
    "slide_count": 12
  }
}

generate_excel: Generate structured data

{
  "name": "generate_excel",
  "arguments": {
    "description": "Q2 2026 sales data: Region, Product Category, Revenue, Units Sold, Growth%, Top Salesperson",
    "rows": 30
  }
}

πŸš€ Quick Start

Prerequisites

  • Node.js >= 18
  • Python 3.8+ with python-pptx, python-docx, openpyxl
  • DashScope API Key (Alibaba BaiLian)

Install Python dependencies

pip install python-pptx python-docx openpyxl

Run the MCP Server

node src/mcp-server.js

The server listens on stdin/stdout using the MCP stdio transport. Configure your Agent's MCP client to launch this process.

Test with MCP Inspector

npx @modelcontextprotocol/inspector node src/mcp-server.js

Manual Test (JSON-RPC via pipe)

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' | node src/mcp-server.js

πŸ“ Project Structure

pharos-ai-doc-genie/
β”œβ”€β”€ src/
β”‚   └── mcp-server.js       # MCP stdio server (self-contained)
β”œβ”€β”€ convert.py               # Python: Markdown β†’ .pptx/.docx/.xlsx
β”œβ”€β”€ output/                  # Generated Office files
β”œβ”€β”€ package.json             # Node.js project config
β”œβ”€β”€ README.md                # This file
β”œβ”€β”€ LICENSE                  # MIT License
└── .gitignore

πŸ”Œ Integration

With Claude Desktop

{
  "mcpServers": {
    "pharos-ai-doc-genie": {
      "command": "node",
      "args": ["/absolute/path/to/pharos-ai-doc-genie/src/mcp-server.js"]
    }
  }
}

With OpenAI Agents

The server uses standard MCP tool schemas that are directly compatible with OpenAI function calling format. Simply configure your Agent to launch the server as an MCP subprocess.

With Pharos Agents

Pharos Agents can call this Skill via the MCP protocol. Once the Skill is registered, Agents discover it through tools/list and call it through tools/call.


πŸ§ͺ Testing

# List available tools
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | node src/mcp-server.js

# Generate a Word document
echo '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"generate_word","arguments":{"topic":"Project Proposal: AI Chatbot","requirements":"A formal proposal for building an enterprise AI chatbot. Include: executive summary, technical approach, timeline, budget estimate.","length":"medium"}}}' | node src/mcp-server.js

# Generate code
echo '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"generate_code","arguments":{"requirement":"A Python async function that fetches data from a REST API with exponential backoff retry logic","language":"python","comments":"en"}}}' | node src/mcp-server.js

⚑ Performance

Tool Avg. Response Time Max Tokens File Size
generate_word ~20-40s 16384 30-50 KB (.docx)
generate_ppt ~30-60s 16384 25-40 KB (.pptx)
generate_excel ~15-25s 16384 5-15 KB (.xlsx)
generate_code ~15-30s 16384 N/A (text)

πŸ”’ Security

  • No API key exposure: The DashScope API key is server-side only and never sent to Agents
  • Input validation: All Agent inputs are validated before processing
  • Output isolation: Generated files are written to a dedicated output directory
  • No persistent state: Each tool call is isolated with no cross-call data leakage

πŸ—ΊοΈ Roadmap

Phase 2 (Agent Arena)

  • [ ] Deploy as a persistent Skill on Pharos chain
  • [ ] On-chain billing per document generation
  • [ ] NFT-based document ownership and verification
  • [ ] Multi-agent collaborative document editing

Beyond

  • [ ] PDF generation and manipulation
  • [ ] Image-to-document conversion (OCR β†’ formatted docx)
  • [ ] Multi-language document templates
  • [ ] Real-time collaborative editing via WebSocket

πŸ‘€ Author

huimingchen081-beep (GitHub)

Built for the Pharos Skill-to-Agent Dual Cascade Hackathon β€” Phase 1 (Skill Hackathon).


πŸ“„ License

MIT License β€” see LICENSE for details.


πŸ™ Acknowledgments

  • Pharos Network β€” for building the AI Agent economy infrastructure
  • DashScope (Alibaba BaiLian) β€” for the LLM API powering content generation
  • Model Context Protocol (Anthropic) β€” for the standardized agent-skill communication protocol
  • python-pptx / python-docx / openpyxl β€” for Office file generation

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured