ViperMCP

ViperMCP

A mixture-of-experts visual question-answering server that enables visual grounding, compositional image question answering, and external knowledge-dependent image question answering through code generation and execution. Built on the ViperGPT framework with support for multiple computer vision models including Grounding DINO, SegmentAnything, and GPT-4o.

Category
Visit Server

README

ViperMCP: A Model Context Protocol for Viper Server

ViperMCP is a mixture-of-experts (MoE) visual question-answering (VQA) server that defines several functions to solve 3 particular task areas: 1) visual grounding, 2) compositional image question answering, and 3) external knowledge-dependent image question answering. It is based heavily on the ViperGPT framework.

The MCP server is structured as a FastMCP streamable-http server and is therefore compatible with all of the client tooling provided by FastMCP.

Setup

OpenAI API Key

An API key for the OpenAI platform is required. It can either be set in the execution environment as OPENAI_API_KEY, referenced by path in the OPENAI_API_KEY_PATH environment variable, or passed as an http query parameter.

Ngrok Account (Optional)

Ngrok can be used to quickly deploy a locally-running server to a public facing URL. Create an account and run pip install ngrok to use.

Assuming that you have followed one of the following installation procedures in the next section, running ngrok http 8000 will forward the public-facing URL to your ViperMCP server.

The address provided by ngrok (or any public facing address) can be used as a substitute for the local address (http://0.0.0.0:8000) we will reference below.

Installation

Smithery Deployment

ViperMCP can be deployed through Smithery.

Dockerized FastMCP Server

Add your OpenAI API key to a file called api.key. In the command below, point the mount source to the location of the api.key.

docker run -i --rm \
--mount type=bind,source=/path/to/api.key,target=/run/secrets/openai_api.key,readonly \
-e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \
-p 8000:8000 \
rsherby/vipermcp:latest

This will begin a CUDA-enabled docker container that can be accessed at http://0.0.0.0:8000/mcp/.

Alternatively, you can use the docker-compose.yaml file to build the image from source and run it. By default, it assumes that the OpenAI API key can be found in the same directory.

If your container provisioner (e.g., cloud provider) allows you to create environment variables and pass them to the container environment, you can also just set the OPENAI_API_KEY variable pre-runtime.

Pure FastMCP Server

Clone the repository to your local device by running the following comand:

git clone --recurse-submodules https://github.com/ryansherby/ViperMCP.git

After cloning, we need to download the pretrained models and set our OpenAI API key. Run the following commands:

cd vipermcp
bash download-models.sh
echo YOUR_OPENAI_API_KEY > api.key

We then suggest creating a virtual environment (e.g., conda or venv) and activating it. This is not a requirement but is generally the best practice for managing Python packages. Then, install the requirements by running the follow commands.

pip install -r requirements.txt
pip install -e .

This will install both the 3rd-party requirements as well the local viper package that is used to standardize import locations.

We can now run our local FastMCP server using the follow command.

python run_server.py

We should be able to access our server now at http://0.0.0.0:8000/mcp/.

To utilize the OpenAI related models, we must pass the OpenAI API key to the following URL like: http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXXXXXXXXXXXXXXXXXX

Usage

FastMCP Client

An example with passing base64-encoded byte-level image data. Image URLs can also be passed.

async with client:
    await client.ping()

    tools = await client.list_tools()  # Optional

    query = await client.call_tool(
        "viper_query",
        {"query": "how many muffins can each kid have for it to be fair?"},
        {"image": f"data:image/png;base64,{image_base64_string}"}
    )

    task = await client.call_tool(
        "viper_task",
        {"task": "return a mask of all the people in the image"},
        {"image": f"data:image/png;base64,{image_base64_string}"}
    )

OpenAI API

Make sure to send the image URL as "type" : "input_text". Currently, the OpenAI API MCP integration cannot handle byte-level image data, so the image must be sent as a public URL.

response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "mcp",
            "server_label": "ViperMCP",
            "server_url": f"{server_url}/mcp/",
            "require_approval": "never",
        },
    ],
    input=[
        {
            "role": "system",
            "content": "Forward any queries or tasks relating to an image directly to the ViperMCP server."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "based on this image, how many muffins can each kid have for it to be fair?"
                },
                {
                    "type": "input_text",
                    "text": f"{img_url},
                },
            ],
        },
    ],
)

Appendix

Models

The following models are used in the default version of ViperMCP:

  • Grounding DINO
  • SegmentAnything (SAM)
  • GPT-4o-mini LLM
  • GPT-4o-mini VLM
  • GPT-4.1
  • X-VLM
  • Midas
  • BERT

Warnings

This package generates and executes code on the machine in which it is run. We do not have any direct control over the code that is executed, and thus the prompting mechanism may be used to expose sensitive data. We have included basic injection prevention tools; however, this will not be sufficient to protect your data in a production environment.

If a production-level environment is your goal, we strongly suggest modifying the src/entrypoint.py to define separate client wrappers using the same naming convention (i.e., find, simple_query, etc.) that forward requests to a backend server. Then, the mcp/server.py should be modified to push requests to this client server, which then makes requests of the backend server. An example flow would be like the following:

MCP Server (Query + Image) => Client Server (Generate Code Request) =>
Backend Server (Generates Code) =>
Client Server (Executes Code with Wrapper Functions) =>
Backend Server (Executes Underlying Functions from Wrapper) =>
Client Server (Forwards Result to MCP Server) =>
MCP Server (Returns Result to User)

Citations

Thank you to the team behind ViperGPT! Your framework and subsequent empirical successes have been invaluable in the creation of this project.

@article{surismenon2023vipergpt,
    title={ViperGPT: Visual Inference via Python Execution for Reasoning},
    author={D\'idac Sur\'is and Sachit Menon and Carl Vondrick},
    journal={arXiv preprint arXiv:2303.08128},
    year={2023}
}

Contributions

If you'd like to contribute to the project, please pass the necessary tests (found in /tests) and create a pull request.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured