Venice AI Image Generator MCP Server

Venice AI Image Generator MCP Server

testing mcp-server functionality venice and gemini (images)

jhacksman

Research & Data
Visit Server

README

Venice AI Image Generator MCP Server

This project implements a Model Context Protocol (MCP) server that integrates with Venice AI for image generation with an approval/regeneration workflow.

What is MCP?

The Model Context Protocol (MCP) is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). It acts as a "USB-C port for AI applications," allowing LLMs to connect to various data sources and tools in a standardized way.

For more information, visit the official MCP introduction page.

Project Overview

This MCP server provides a bridge between LLMs (like Claude) and Venice AI's image generation capabilities. It enables LLMs to generate images based on text prompts and implements an interactive approval workflow with thumbs up/down feedback.

Key Features

Image Generation with Approval Workflow

The core functionality of this server is to:

  1. Generate images using Venice AI based on text prompts
  2. Display the generated image to the user with clickable thumbs up/down icons overlaid directly on the image
  3. Allow users to approve the image (clicking thumbs up) or request a regeneration (clicking thumbs down)
  4. Regenerate images with the same parameters if requested

Technical Implementation

The server implements several MCP tools:

  • generate_venice_image: Creates an image from a text prompt and returns it with approval options
  • approve_image: Marks an image as approved when the user gives a thumbs up
  • regenerate_image: Creates a new image with the same parameters when the user gives a thumbs down
  • list_available_models: Provides information about available Venice AI models

User Experience

From the user's perspective, the interaction flow is:

  1. User provides a text prompt to generate an image
  2. LLM calls the MCP server to generate the image
  3. LLM displays the image with clickable thumbs up/down icons overlaid directly on the image
  4. User clicks the thumbs up icon on the image to approve or thumbs down icon to regenerate
  5. If thumbs down, the process repeats until the user approves an image

Architecture

The server follows the MCP client-server architecture:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│             │     │             │     │             │
│  LLM Host   │◄────┤  MCP Server │◄────┤  Venice AI  │
│ (e.g. Claude)│     │             │     │    API     │
│             │     │             │     │             │
└─────────────┘     └─────────────┘     └─────────────┘
  1. LLM Host: The application running the LLM (e.g., Claude)
  2. MCP Server: Our server that implements the MCP protocol and connects to Venice AI
  3. Venice AI API: The external service that generates images

Implementation Details

MCP Server Components

The server consists of:

  1. FastMCP Server: The core server that handles MCP protocol communication
  2. Venice AI Integration: Code that interfaces with the Venice AI API
  3. Image Cache: In-memory storage for tracking generated images and their approval status
  4. Tool Definitions: Functions that LLMs can call to interact with the server

Data Flow

  1. LLM receives a prompt from the user
  2. LLM calls the generate_venice_image tool with the prompt
  3. Server sends request to Venice AI API
  4. Venice AI generates the image and returns a URL
  5. Server caches the image details and returns the URL with approval options
  6. LLM displays the image and approval options to the user
  7. User selects thumbs up or thumbs down
  8. LLM calls either approve_image or regenerate_image based on user selection
  9. If regenerating, the process repeats from step 3

Example Usage

When connected to an LLM like Claude, the interaction would look like:

User: Generate an image of a futuristic city skyline
Claude: I'll generate that image for you using Venice AI.

[Image of futuristic city skyline with clickable 👍 and 👎 icons overlaid on the image]

User: 👎 (Thumbs down)
Claude: Let me generate a new version for you.

[New image of futuristic city skyline with clickable 👍 and 👎 icons overlaid on the image]

User: 👍 (Thumbs up)
Claude: Great! I've saved this approved image for you.

Gemini Integration for Multi-View Generation

After a user approves an image (by clicking the thumbs up icon), the system automatically processes the approved image through Google's Gemini API to generate multiple consistent views of the 3D object:

  1. The approved Venice AI image is used as input to the Gemini view generation scripts
  2. Four different views are generated sequentially:
    • Front view (0°) - Generated first
    • Right view (90°) - Generated after front view completes
    • Left view (270°) - Generated after right view completes
    • Back view (180°) - Generated after left view completes
  3. Each view is displayed in a 4-up layout as it becomes available
  4. Each script waits for the previous script to complete successfully before executing

4-Up View Approval Process

Each of the four generated views has its own thumbs up/down approval system:

  1. Each view in the 4-up display has thumbs up/down icons overlaid on the image
  2. If a user selects thumbs down for any specific view:
    • The corresponding Python script for that view is run again
    • The newly generated image replaces the rejected image in the 4-up display
    • This process repeats until the user approves the image with thumbs up
  3. Each view can be individually approved or regenerated

3D Model Generation

Once all four views are approved:

  1. The original Venice AI image and the four approved Gemini-generated views are processed using CUDA Multi-View Stereo
  2. This processing occurs on a dedicated Linux server on the network
  3. The CUDA Multi-View Stereo system converts the 2D images into a 3D model

This multi-view generation leverages Gemini's object consistency capabilities to create coherent representations of the 3D object from different angles while maintaining the same style, colors, and proportions as the original Venice AI image.

Future Enhancements

Potential future improvements include:

  1. Persistent Storage: Save approved images to a database
  2. Image Editing: Allow users to request specific modifications to generated images
  3. Multiple Image Generation: Generate several variations at once for the user to choose from
  4. Additional Views: Generate more angles beyond the four cardinal directions

Venice AI Integration

The server integrates with Venice AI's image generation API, which provides high-quality image generation capabilities. The API allows for:

  • Generating images from text prompts
  • Customizing image dimensions
  • Adjusting generation parameters
  • Using different models for different styles

Getting Started

To implement this server, you would need to:

  1. Install the FastMCP library
  2. Set up Venice AI API credentials
  3. Implement the MCP tools as described
  4. Run the server and connect it to an LLM host

MCP Resources

For more information about the Model Context Protocol and how to build MCP servers, check out these resources:

Recommended Servers

Crypto Price & Market Analysis MCP Server

Crypto Price & Market Analysis MCP Server

A Model Context Protocol (MCP) server that provides comprehensive cryptocurrency analysis using the CoinCap API. This server offers real-time price data, market analysis, and historical trends through an easy-to-use interface.

Featured
TypeScript
MCP PubMed Search

MCP PubMed Search

Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.

Featured
Python
dbt Semantic Layer MCP Server

dbt Semantic Layer MCP Server

A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.

Featured
TypeScript
mixpanel

mixpanel

Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.

Featured
TypeScript
Sequential Thinking MCP Server

Sequential Thinking MCP Server

This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.

Featured
Python
Nefino MCP Server

Nefino MCP Server

Provides large language models with access to news and information about renewable energy projects in Germany, allowing filtering by location, topic (solar, wind, hydrogen), and date range.

Official
Python
Vectorize

Vectorize

Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.

Official
JavaScript
Mathematica Documentation MCP server

Mathematica Documentation MCP server

A server that provides access to Mathematica documentation through FastMCP, enabling users to retrieve function documentation and list package symbols from Wolfram Mathematica.

Local
Python
kb-mcp-server

kb-mcp-server

An MCP server aimed to be portable, local, easy and convenient to support semantic/graph based retrieval of txtai "all in one" embeddings database. Any txtai embeddings db in tar.gz form can be loaded

Local
Python
Research MCP Server

Research MCP Server

The server functions as an MCP server to interact with Notion for retrieving and creating survey data, integrating with the Claude Desktop Client for conducting and reviewing surveys.

Local
Python