Puter MCP Server
Provides AI-powered media generation tools including image, speech, video, OCR, and voice conversion via the Model Context Protocol.
README
Puter MCP Server
MCP (Model Context Protocol) server for Puter AI media generation. Provides 6 AI-powered tools for image generation, text-to-speech, video generation, OCR, speech-to-text, and voice conversion.
Features
- txt2img: Text-to-image generation with multiple providers (OpenAI, Gemini, Together, xAI, Replicate)
- txt2speech: Text-to-speech conversion with multiple voices and engines
- txt2vid: Text-to-video generation (Sora, Veo, TogetherAI)
- img2txt: Image-to-text (OCR) with AWS Textract or Mistral
- speech2txt: Speech-to-text transcription
- speech2speech: Voice conversion using ElevenLabs
Key Features
- Intelligent Default Models: Automatically selects the best model based on task type
- Text-to-image:
gpt-image-2(OpenAI) - Image-to-image:
gemini-2.5-flash-image-preview(Gemini)
- Text-to-image:
- Multiple Providers: Support for OpenAI, Google Gemini, xAI (Grok), Replicate, Together AI, ElevenLabs
- Flexible Output: Supports base64 and URL output formats
- Test Mode: Built-in test mode for development without consuming credits
Quick Start
Prerequisites
- Node.js 18+
- Puter API Key (get from puter.com)
Installation
# Clone the repository
git clone https://github.com/your-username/puter-mcp.git
cd puter-mcp
# Install dependencies
npm install
# Build the project
npm run build
Configuration
- Copy the environment file:
cp .env.example .env
- Edit
.envand add your Puter API key:
PUTER_API_KEY=your_puter_api_key_here
Usage
Claude Desktop / Trae
Add the following to your Claude Desktop or Trae configuration file:
Windows:
%APPDATA%\Trae\mcp_settings.json
macOS:
~/Library/Application Support/Trae/mcp_settings.json
Linux:
~/.config/Trae/mcp_settings.json
Configuration content:
{
"mcpServers": {
"puter-mcp": {
"command": "node",
"args": ["path/to/puter-mcp/dist/index.js"],
"env": {
"PUTER_API_KEY": "your_api_key"
}
}
}
}
Command Line
# Stdio mode (default)
npm start
# SSE mode
TRANSPORT=sse PORT=3000 npm start
Tools Reference
txt2img
Generate images from text prompts. Supports both text-to-image and image-to-image.
| Parameter | Type | Description |
|---|---|---|
prompt |
string | Text description for the image |
model |
string | Model to use (default: gpt-image-2 for text-to-image, gemini-2.5-flash-image-preview for image-to-image) |
provider |
string | AI provider (openai-image-generation, gemini, together, xai, replicate-image-generation) |
quality |
string | Image quality (high, medium, low, hd, standard) |
ratio |
object | Aspect ratio {w, h} |
input_image |
string | Input image for image-to-image (Base64 or URL) |
test_mode |
boolean | Test mode without credits |
output_format |
string | Output format (base64, url) |
Example:
Generate a picture of a cat
txt2speech
Convert text to speech.
| Parameter | Type | Description |
|---|---|---|
text |
string | Text to convert |
provider |
string | TTS provider (aws-polly, openai, elevenlabs, gemini, xai) |
model |
string | TTS model |
voice |
string | Voice ID |
engine |
string | Synthesis engine (standard, neural, long-form, generative) |
language |
string | Language code |
test_mode |
boolean | Test mode |
Example:
Convert "Hello world" to speech
txt2vid
Generate videos from text prompts.
| Parameter | Type | Description |
|---|---|---|
prompt |
string | Video description |
model |
string | Video model (sora-2, veo-3.1-generate-preview, etc.) |
seconds |
number | Video duration (4, 8, 12) |
size |
string | Resolution (e.g., 1280x720) |
test_mode |
boolean | Test mode |
Example:
Generate a video of a drone flying over mountains
img2txt
Extract text from images (OCR).
| Parameter | Type | Description |
|---|---|---|
source |
string | Image URL, Base64, or Puter path |
provider |
string | OCR provider (aws-textract, mistral) |
test_mode |
boolean | Test mode |
Example:
Extract text from this image: https://example.com/document.png
speech2txt
Convert speech to text.
| Parameter | Type | Description |
|---|---|---|
audio |
string | Audio URL, Base64, or Puter path |
provider |
string | STT provider (openai, xai) |
model |
string | Model name |
language |
string | Language code |
translate |
boolean | Translate to English |
test_mode |
boolean | Test mode |
Example:
Transcribe this audio: https://example.com/speech.mp3
speech2speech
Convert voice to another voice using ElevenLabs.
| Parameter | Type | Description |
|---|---|---|
audio |
string | Input audio URL, Base64, or Puter path |
voice |
string | Target ElevenLabs voice ID |
model |
string | Voice model (default: eleven_multilingual_sts_v2) |
output_format |
string | Output format |
test_mode |
boolean | Test mode |
Example:
Convert this voice to a different voice: https://example.com/speech.mp3
Development
Project Structure
puter-mcp/
├── src/
│ ├── index.ts # Server entry point
│ ├── client.ts # Puter SDK initialization
│ ├── utils.ts # Response formatting utilities
│ ├── puter.d.ts # TypeScript declarations
│ └── tools/
│ ├── index.ts # Tool registration
│ ├── txt2img.ts
│ ├── txt2speech.ts
│ ├── txt2vid.ts
│ ├── img2txt.ts
│ ├── speech2txt.ts
│ └── speech2speech.ts
├── scripts/
│ └── verify-responses.ts # SDK response verification
├── dist/ # Compiled output
├── package.json
└── tsconfig.json
Build
npm run build
Type Check
npm run typecheck
Development Mode
npm run dev
License
MIT License - see LICENSE for details.
Acknowledgments
Support
- Issue Tracker: https://github.com/your-username/puter-mcp/issues
- Documentation: https://docs.puter.com/AI/
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.