MCP Servers

Puter MCP Server

Provides AI-powered media generation tools including image, speech, video, OCR, and voice conversion via the Model Context Protocol.

README

Puter MCP Server

English | 中文

MCP (Model Context Protocol) server for Puter AI media generation. Provides 6 AI-powered tools for image generation, text-to-speech, video generation, OCR, speech-to-text, and voice conversion.

Features

txt2img: Text-to-image generation with multiple providers (OpenAI, Gemini, Together, xAI, Replicate)
txt2speech: Text-to-speech conversion with multiple voices and engines
txt2vid: Text-to-video generation (Sora, Veo, TogetherAI)
img2txt: Image-to-text (OCR) with AWS Textract or Mistral
speech2txt: Speech-to-text transcription
speech2speech: Voice conversion using ElevenLabs

Key Features

Intelligent Default Models: Automatically selects the best model based on task type
- Text-to-image: gpt-image-2 (OpenAI)
- Image-to-image: gemini-2.5-flash-image-preview (Gemini)
Multiple Providers: Support for OpenAI, Google Gemini, xAI (Grok), Replicate, Together AI, ElevenLabs
Flexible Output: Supports base64 and URL output formats
Test Mode: Built-in test mode for development without consuming credits

Quick Start

Prerequisites

Node.js 18+
Puter API Key (get from puter.com)

Installation

# Clone the repository
git clone https://github.com/your-username/puter-mcp.git
cd puter-mcp

# Install dependencies
npm install

# Build the project
npm run build

Configuration

Copy the environment file:

cp .env.example .env

Edit .env and add your Puter API key:

PUTER_API_KEY=your_puter_api_key_here

Usage

Claude Desktop / Trae

Add the following to your Claude Desktop or Trae configuration file:

Windows:

%APPDATA%\Trae\mcp_settings.json

macOS:

~/Library/Application Support/Trae/mcp_settings.json

Linux:

~/.config/Trae/mcp_settings.json

Configuration content:

{
  "mcpServers": {
    "puter-mcp": {
      "command": "node",
      "args": ["path/to/puter-mcp/dist/index.js"],
      "env": {
        "PUTER_API_KEY": "your_api_key"
      }
    }
  }
}

Command Line

# Stdio mode (default)
npm start

# SSE mode
TRANSPORT=sse PORT=3000 npm start

Tools Reference

txt2img

Generate images from text prompts. Supports both text-to-image and image-to-image.

Parameter	Type	Description
`prompt`	string	Text description for the image
`model`	string	Model to use (default: gpt-image-2 for text-to-image, gemini-2.5-flash-image-preview for image-to-image)
`provider`	string	AI provider (openai-image-generation, gemini, together, xai, replicate-image-generation)
`quality`	string	Image quality (high, medium, low, hd, standard)
`ratio`	object	Aspect ratio {w, h}
`input_image`	string	Input image for image-to-image (Base64 or URL)
`test_mode`	boolean	Test mode without credits
`output_format`	string	Output format (base64, url)

Example:

Generate a picture of a cat

txt2speech

Convert text to speech.

Parameter	Type	Description
`text`	string	Text to convert
`provider`	string	TTS provider (aws-polly, openai, elevenlabs, gemini, xai)
`model`	string	TTS model
`voice`	string	Voice ID
`engine`	string	Synthesis engine (standard, neural, long-form, generative)
`language`	string	Language code
`test_mode`	boolean	Test mode

Example:

Convert "Hello world" to speech

txt2vid

Generate videos from text prompts.

Parameter	Type	Description
`prompt`	string	Video description
`model`	string	Video model (sora-2, veo-3.1-generate-preview, etc.)
`seconds`	number	Video duration (4, 8, 12)
`size`	string	Resolution (e.g., 1280x720)
`test_mode`	boolean	Test mode

Example:

Generate a video of a drone flying over mountains

img2txt

Extract text from images (OCR).

Parameter	Type	Description
`source`	string	Image URL, Base64, or Puter path
`provider`	string	OCR provider (aws-textract, mistral)
`test_mode`	boolean	Test mode

Example:

Extract text from this image: https://example.com/document.png

speech2txt

Convert speech to text.

Parameter	Type	Description
`audio`	string	Audio URL, Base64, or Puter path
`provider`	string	STT provider (openai, xai)
`model`	string	Model name
`language`	string	Language code
`translate`	boolean	Translate to English
`test_mode`	boolean	Test mode

Example:

Transcribe this audio: https://example.com/speech.mp3

speech2speech

Convert voice to another voice using ElevenLabs.

Parameter	Type	Description
`audio`	string	Input audio URL, Base64, or Puter path
`voice`	string	Target ElevenLabs voice ID
`model`	string	Voice model (default: eleven_multilingual_sts_v2)
`output_format`	string	Output format
`test_mode`	boolean	Test mode

Example:

Convert this voice to a different voice: https://example.com/speech.mp3

Development

Project Structure

puter-mcp/
├── src/
│   ├── index.ts          # Server entry point
│   ├── client.ts         # Puter SDK initialization
│   ├── utils.ts          # Response formatting utilities
│   ├── puter.d.ts       # TypeScript declarations
│   └── tools/
│       ├── index.ts      # Tool registration
│       ├── txt2img.ts
│       ├── txt2speech.ts
│       ├── txt2vid.ts
│       ├── img2txt.ts
│       ├── speech2txt.ts
│       └── speech2speech.ts
├── scripts/
│   └── verify-responses.ts  # SDK response verification
├── dist/                 # Compiled output
├── package.json
└── tsconfig.json

Build

npm run build

Type Check

npm run typecheck

Development Mode

npm run dev

License

MIT License - see LICENSE for details.

Acknowledgments

Puter - AI services provider
MCP SDK - Model Context Protocol

Support

Issue Tracker: https://github.com/your-username/puter-mcp/issues
Documentation: https://docs.puter.com/AI/

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured