z_ai_image_gen_mcp
MCP server for generating images and videos using Z.AI models (GLM-Image, CogView-4, CogVideoX-3, Vidu Q1, etc.) with support for synchronous and asynchronous generation, downloads, and multiple input modes.
README
Z.AI Image & Video Generation MCP Server
A Model Context Protocol (MCP) server that provides access to Z.AI's image and video generation models for LLM applications.
Features
- Image Generation: GLM-Image and CogView-4 models for high-quality image generation
- Video Generation: CogVideoX-3, Vidu Q1, and Vidu 2 models for AI video creation
- Multiple Input Modes: Text-to-image/video, image-to-video, start-end frame animation
- Asynchronous Processing: Submit long-running tasks and poll for results
- Automatic Downloads: Generate and download in a single operation
- Automatic Retries: Built-in retry logic with exponential backoff
- Comprehensive Validation: Input validation with clear error messages
- Type-Safe: Full TypeScript support with detailed type definitions
Installation
npm install GeorgH93/z_ai_image_gen_mcp
Configuration
Set your Z.AI API key as an environment variable:
export ZAI_API_KEY=your_api_key_here
Get your API key from the Z.AI API Keys page or sign up for the GLM Coding Plan.
Optional Configuration
| Environment Variable | Description | Default |
|---|---|---|
ZAI_API_BASE_URL |
API base URL | https://api.z.ai/api |
ZAI_DEFAULT_MODEL |
Default model | glm-image |
ZAI_DEFAULT_SIZE |
Default image size | 1280x1280 |
ZAI_REQUEST_TIMEOUT |
Request timeout (ms) | 60000 |
ZAI_MAX_RETRIES |
Max retry attempts | 3 |
ZAI_RETRY_DELAY |
Initial retry delay (ms) | 1000 |
Usage
With Claude Desktop
Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"z-ai-image": {
"command": "npx",
"args": ["z-ai-image-mcp"],
"env": {
"ZAI_API_KEY": "your_api_key_here"
}
}
}
}
With Other MCP Clients
Run the server directly:
npx z-ai-image-mcp
Or programmatically:
import { createServer, loadConfig } from 'z-ai-image-mcp';
const config = loadConfig();
const server = createServer(config);
// Connect to your transport...
With OpenCode
Add to your OpenCode configuration (opencode.json or opencode.jsonc in your project root):
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"z-ai-image": {
"type": "local",
"command": ["npx", "z-ai-image-mcp"],
"enabled": true,
"environment": {
"ZAI_API_KEY": "your_api_key_here"
}
}
}
}
Or using an environment variable reference:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"z-ai-image": {
"type": "local",
"command": ["npx", "z-ai-image-mcp"],
"enabled": true,
"environment": {
"ZAI_API_KEY": "{env:ZAI_API_KEY}"
}
}
}
}
Using with OpenCode prompts:
Generate a professional logo for a tech startup. use z-ai-image
Or add to your AGENTS.md:
When generating images, use the `z-ai-image` MCP server tools.
Per-agent configuration (optional):
To enable the MCP server only for specific agents:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"z-ai-image": {
"type": "local",
"command": ["npx", "z-ai-image-mcp"],
"enabled": true,
"environment": {
"ZAI_API_KEY": "{env:ZAI_API_KEY}"
}
}
},
"tools": {
"z-ai-image*": false
},
"agent": {
"design-agent": {
"tools": {
"z-ai-image*": true
}
}
}
}
Available Tools
1. list_models
List all available image generation models and their capabilities.
Use this tool to discover available models, their features, and recommended settings.
2. generate_image
Generate an image synchronously from a text prompt.
Parameters:
prompt(required): Text description of the image (max 4000 characters)model(optional):glm-imageorcogview-4-250304(default:glm-image)size(optional): Image dimensions, e.g.,1280x1280(default:1280x1280)quality(optional):hdorstandard(default:hdfor GLM-Image)user_id(optional): End user ID for abuse prevention (6-128 characters)
Example:
Generate an image of a cute kitten sitting on a windowsill with a sunset background.
3. generate_image_async
Start an asynchronous image generation task. Returns a task ID for polling.
Parameters:
prompt(required): Text description of the imagemodel(optional): Onlyglm-imagesupports async (default:glm-image)size(optional): Image dimensions (default:1280x1280)quality(optional): Onlyhdsupported for async (default:hd)user_id(optional): End user ID for abuse prevention
Example:
Start async generation of a complex poster design.
4. get_async_result
Retrieve the result of an asynchronous image generation task.
Parameters:
task_id(required): The task ID fromgenerate_image_async
Example:
Check the status of task ID "task-12345".
5. download_image
Download an image from a URL and return it as base64 or save to a file.
Parameters:
url(required): The URL of the image to download (e.g., fromgenerate_imageorget_async_result)output(optional):base64orfile_output(default:base64)file_output(optional): Absolute path to save the image file (required if output isfile_output). Example:/path/to/image.png
Output Modes:
base64: Returns the image data directly as base64 (auto-switches to file if > 1MB)file_output: Saves the image to disk at the specified path
Example:
Download the generated image and save it to /home/user/images/logo.png
Note: Z.AI image URLs expire after 30 days. Use this tool to download and store images permanently.
6. generate_and_download_image ⭐ Recommended
Generate an image and automatically download it in a single operation. This is the most convenient tool when you want the image data immediately.
Parameters:
prompt(required): Text description of the image (max 4000 characters)model(optional):glm-imageorcogview-4-250304(default:glm-image)size(optional): Image dimensions, e.g.,1280x1280(default:1280x1280)quality(optional):hdorstandard(default:hdfor GLM-Image)user_id(optional): End user ID for abuse prevention (6-128 characters)output(optional):base64orfile_output(default:base64)file_output(optional): Absolute path to save the image file (required if output isfile_output)poll_interval(optional): Seconds to wait between polling for async results (default: 3)max_wait(optional): Maximum seconds to wait for generation (default: 120)
Output Modes:
base64: Returns the image data directly as base64 (auto-switches to file if > 1MB)file_output: Saves the image to disk at the specified path
Examples:
# Generate and get as base64
Generate a logo for my company and show me the image.
# Generate and save to file
Generate a logo and save it to /home/user/images/logo.png
Behavior:
- For GLM-Image: Uses async API with automatic polling until complete
- For CogView-4: Uses synchronous API
- Automatically downloads the result once generation completes
- Returns image as base64 or saves to specified path
Video Generation Tools
7. list_video_models
List all available video generation models and their capabilities.
Use this tool to discover available video models, their features, and supported parameters.
8. generate_video
Generate a video asynchronously from text or images. Returns a task ID for polling.
Parameters:
model(required): Video generation modelcogvideox-3: Z.AI flagship model (up to 4K, 5-10s, audio support)viduq1-text: Text-to-video, 1080P, 5sviduq1-image: Image-to-video, 1080P, 5sviduq1-start-end: Start-end frame, 1080P, 5svidu2-image: Image-to-video, 720P, 4s (faster, cheaper)vidu2-start-end: Start-end frame, 720P, 4svidu2-reference: Reference-based, 720P, 4s
prompt(optional): Text description (max 512 characters)image_url(optional): Image URL(s) for image-to-video generationquality(CogVideoX-3):qualityorspeedsize(optional): Video resolutionduration(optional): Video duration in secondsfps(CogVideoX-3): 30 or 60with_audio(optional): Generate AI sound effectsstyle(Vidu Q1 text):generaloranimeaspect_ratio(Vidu Q1/2):16:9,9:16, or1:1movement_amplitude(Vidu):auto,small,medium, orlargeuser_id(optional): End user ID for abuse prevention
Examples:
# Text-to-video
Generate a video of a cat playing with a ball.
# Image-to-video
Animate this image: [image_url]
# Start-end frame
Create a smooth transition from [first_frame] to [last_frame].
9. get_video_result
Retrieve the result of an asynchronous video generation task.
Parameters:
task_id(required): The task ID fromgenerate_video
Note: Video generation typically takes 30 seconds to several minutes depending on duration and quality.
10. generate_and_download_video ⭐ Recommended
Generate a video and automatically download it. Polls for completion and saves the video file.
Parameters:
- All parameters from
generate_videoplus: file_output(optional): Absolute path to save the video filepoll_interval(optional): Seconds to wait between polling (default: 10)max_wait(optional): Maximum seconds to wait (default: 300)
Example:
Generate a video of a sunset over the ocean and save it to /home/user/videos/sunset.mp4
Note: Videos are always saved to file (too large for base64). Video URLs expire after 1 day.
Models
GLM-Image
Z.AI's flagship image generation model with a hybrid autoregressive + diffusion architecture.
- Best for: Complex compositions, text rendering, detailed illustrations, commercial posters
- Quality options:
hd(detailed, ~20s),standard(faster, ~5-10s) - Size range: 1024-2048px per dimension (divisible by 32)
- Recommended sizes: 1280×1280, 1568×1056, 1056×1568, 1472×1088, 1088×1472, 1728×960, 960×1728
- Async support: Yes
CogView-4-250304
General-purpose image generation with fast text understanding.
- Best for: General image generation, quick iterations
- Quality options:
hd,standard - Size range: 512-2048px per dimension (divisible by 16)
- Recommended sizes: 1024×1024, 768×1344, 864×1152, 1344×768, 1152×864, 1440×720, 720×1440
- Async support: No
Video Models
CogVideoX-3
Z.AI's flagship video generation model with improved frame stability and clarity.
- Best for: Text-to-video, image-to-video, start-end frame animation
- Resolution: Up to 4K (3840x2160)
- Duration: 5 or 10 seconds
- Features: Audio generation, 30/60 FPS, quality/speed modes
- Price: $0.20/video
Vidu Q1
High-quality video generation with 1080P output.
| Model | Capability | Duration | Price |
|---|---|---|---|
viduq1-text |
Text-to-video | 5s | $0.40 |
viduq1-image |
Image-to-video | 5s | $0.40 |
viduq1-start-end |
Start-end frame | 5s | $0.40 |
- Features: General/anime styles, motion amplitude control
Vidu 2
Fast and cost-effective video generation with 720P output.
| Model | Capability | Duration | Price |
|---|---|---|---|
vidu2-image |
Image-to-video | 4s | $0.20 |
vidu2-start-end |
Start-end frame | 4s | $0.20 |
vidu2-reference |
Reference-based | 4s | $0.40 |
- Features: Audio generation, motion amplitude control, multi-image reference
Error Handling
The server handles various error scenarios:
| Error Type | Description |
|---|---|
AUTH_ERROR |
Invalid or missing API key |
RATE_LIMIT |
Too many requests - will auto-retry |
VALIDATION_ERROR |
Invalid parameters |
SERVER_ERROR |
Z.AI server issues - will auto-retry |
NETWORK_ERROR |
Connection issues - will auto-retry |
TIMEOUT_ERROR |
Request timeout - will auto-retry |
CONTENT_FILTER |
Prompt blocked by content policy |
Development
Setup
git clone <repo-url>
cd z-ai-image-mcp
npm install
cp .env.example .env
# Edit .env with your API key
Scripts
npm run build # Build TypeScript
npm run dev # Run in development mode
npm test # Run all tests
npm run test:unit # Run unit tests only
npm run test:integration # Run integration tests
npm run test:e2e # Run E2E tests
npm run test:coverage # Run tests with coverage
npm run typecheck # Type check without emit
License
MIT
Links
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.