local_ai_gen
Local AI generation for images, audio, speech, and 3D models using open source models.
README
Local AI Generation MCP Server (local_ai_gen)
This project runs a local MCP (Model Context Protocol) server that exposes tools for:
- Text to Image
- Text to Music / Audio
- Text to Speech
- Image/Text to 3D Model
Models Used
- Image Generation:
segmind/SSD-1BA fast, distilled version of SDXL that works well on consumer GPUs. - Music Generation:
stabilityai/stable-audio-open-1.0A high-quality open model for generating sound effects, short music tracks, and ambient audio. (Note: Requires accepting license terms) - Speech Generation:
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoiceA robust, multilingual text-to-speech model. - 3D Generation:
stabilityai/TripoSRA fast feed-forward model for image-to-3D reconstruction.
Requirements
- Python 3.10+
- NVIDIA GPU (8GB+ VRAM recommended for running everything smoothly)
- Hugging Face User Access Token (for
stable-audio-open-1.0)
Complete Setup Guide
1. Preparation and Hugging Face Authenication
The Stable Audio Open model is gated and requires you to accept its license before downloading.
- Go to stabilityai/stable-audio-open-1.0 on Hugging Face.
- Log in and click to Accept the Terms/License.
- Generate a User Access Token in your Hugging Face Settings.
- Run the Hugging Face CLI login locally:
(Paste your token when prompted. You do not need to add it as a git credential).pip install -U "huggingface_hub" hf auth
2. Environment Installation
python -m venv .venv
source .venv/bin/activate
python setup.py
What python setup.py does:
- Creates output directories (
generated_images,generated_audio,generated_models,models) - Clones
third_party/TripoSRif it is missing - Installs everything from
requirements.txt - Compiles and installs
torchmcubesagainst your active torch version (needed for 3D generation) - Installs
flash-attn.
Run the MCP server
To run the MCP server manually (via standard I/O):
python mcp_server/main.py
Note: The first run of any specific tool will be slower because it has to download the weights for that model into your Hugging Face cache.
Smoke Test
You can run the included smoke test script to verify all models are working correctly:
python smoke_test_generate.py
MCP Tools Exposed
generate_imagegenerate_audiogenerate_speechgenerate_3d_modelhealth_check
Notes
- Generated files are written to
generated_images,generated_audio, andgenerated_modelsby default. - For
generate_audioandgenerate_speech, you can override the destination with: - tool arg
output_dir(highest priority), or - env var
GENAI_OUTPUT_AUDIO_DIR - For
generate_image, override destination with tool argoutput_diror env varGENAI_OUTPUT_IMAGE_DIR - For
generate_3d_model, override destination with tool argoutput_diror env varGENAI_OUTPUT_MODEL_DIR
Using with MCP Clients (Cursor, Claude Desktop, etc.)
To use this server in an MCP-compatible client, add the following to your mcp.json (or the respective MCP configuration file for your client). Make sure to replace <YOUR_PROJECT_PATH> with the absolute path to where you cloned this repository:
{
"mcpServers": {
"local_ai_gen": {
"command": "<YOUR_PROJECT_PATH>/.venv/bin/python",
"args": [
"<YOUR_PROJECT_PATH>/mcp_server/main.py"
],
"env": {}
}
}
}
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.