Confident AI

Confident AI

Connects AI-powered tools to the Confident AI platform for evaluating, observing, and iterating on AI quality, enabling cloud evaluations, dataset management, prompt versioning, production tracing, and human annotations directly from editors like Cursor and Claude Code.

Category
Visit Server

README

Confident AI MCP Server

License: MIT Python 3.12+ MCP Confident AI

The Confident AI MCP Server connects AI-powered tools to Confident AI, a platform to evaluate, observe, and iterate on AI quality. It gives you full control over your resources directly from your editor:

  • Cloud evaluations and metric collections
  • Evaluation datasets
  • Prompt versioning and management
  • Production tracing and observability
  • Human annotations and feedback

For users of DeepEval, Confident AI is also the native backend and persistence layer for your evaluation results. This MCP server gives you the ability to iterate on your AI application by bringing all of that data directly into tools like Cursor and Claude Code.

[!WARNING] This MCP server is currently in beta. We invite everyone to try it out but also reach out to the Confident AI team before doing so to avoid any surprises in functionality.

<p align="center"> <img src="assets/confident-mcp-architecture.png" alt="Confident AI MCP Architecture" width="600"> </p>

Use Cases

Built for developers who want to iterate faster on their AI applications from inside editors like Cursor, Claude Code, and Windsurf — from simple queries to fully automated improvement workflows:

  • 10x your iteration speed. Run an eval, check if a set of prompts are better — in one continuous workflow instead of scattered across tools. What used to take an hour of tab-switching now takes one conversation.
  • Go from eval results to action plan automatically. Your AI assistant can pull eval results, read what failed and why, and draft a plan for what to improve next — no manual analysis needed.
  • Use production traces for iteration. Pull the trace, see what went in and what came out, read what users said — and fix it before anyone else notices.
  • Let human feedback drive your next iteration. Pull annotation data your team left on production traces and have your AI assistant use it to decide what to fix and how.

Every time you leave your editor to check eval results, tweak a prompt in a dashboard, or look up what your team annotated — you lose context and iteration speed.

How is this different from the platform?

Confident AI has a full web UI where you can do all of this with a mouse. This MCP server is the same platform, accessed from your editor instead. Think: AWS web console vs. AWS CLI — same resources, different interface.

The server speaks the Model Context Protocol (MCP), so any compatible client connects out of the box. The web UI isn't going anywhere. This is just another way in.

Jump Ahead

Prerequisites

  1. A Confident AI API key.
  2. An MCP-compatible client — Cursor, Claude, Windsurf, or any other client that supports the Model Context Protocol.

Quickstart

Confident AI hosts the MCP server for you. Pick your region:

Region MCP Server URL
US (default) https://mcp.confident-ai.com/mcp
EU https://eu.mcp.confident-ai.com/mcp
Self-hosted Use your own deployment URL

[!TIP] The examples below use the US server URL. For other regions, swap the URL:

  • EU: https://eu.mcp.confident-ai.com/mcp
  • Self-hosted / On-prem: If you're running your own instance of Confident AI, you can run this MCP server yourself and point it at your deployment. See Running the Server Locally for setup instructions.

🖥️ Cursor

Add the following to your .cursor/mcp.json file:

{
  "mcpServers": {
    "confident-ai": {
      "url": "https://mcp.confident-ai.com/mcp",
      "headers": {
        "Authorization": "Bearer <YOUR_CONFIDENT_API_KEY>"
      }
    }
  }
}

🤖 Claude Code (or Desktop)

Claude Code — run the following command in your terminal:

claude mcp add --transport http confident-ai https://mcp.confident-ai.com/mcp --header "Authorization: Bearer <YOUR_CONFIDENT_API_KEY>"

Claude Desktop — add the following to your claude_desktop_config.json file:

{
  "mcpServers": {
    "confident-ai": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.confident-ai.com/mcp",
        "--header",
        "Authorization: Bearer <YOUR_CONFIDENT_API_KEY>"
      ]
    }
  }
}

🏄 Windsurf

Add the following to your Windsurf MCP configuration:

{
  "mcpServers": {
    "confident-ai": {
      "serverUrl": "https://mcp.confident-ai.com/mcp",
      "headers": {
        "Authorization": "Bearer <YOUR_CONFIDENT_API_KEY>"
      }
    }
  }
}

🛠️ Running the Server Locally

If you're self-hosting or contributing to this project, you can run the server from source.

Prerequisites: Python >= 3.12, Poetry

poetry install
poetry run python server.py

The server will start on http://0.0.0.0:8081. It uses the Streamable HTTP transport — a single /mcp endpoint that handles both GET and POST.

The /mcp endpoint requires a Bearer token in the Authorization header (your Confident AI API key).

When running locally, point your MCP client to http://localhost:8081/mcp instead of the hosted URLs above.

To run in stdio mode instead (for MCP clients that communicate over stdin/stdout), uncomment the relevant block at the bottom of server.py:

if __name__ == "__main__":
    mcp.run(transport="stdio")

Configuration

[!NOTE] This section is only relevant if you're running the server locally. If you're using the hosted server, the only thing you need is your API key in the quickstart configs above.

The server is configured through environment variables. You can set these in a .env file in the project root.

Variable Description Default
CONFIDENT_API_KEY Your Confident AI API key Required
CONFIDENT_ENVIRONMENT LOCAL, PROD, or ON_PREM LOCAL
CONFIDENT_REGION US, EU, or AU (only used when CONFIDENT_ENVIRONMENT=PROD) US
CONFIDENT_BACKEND_LOCAL_URL Backend URL for local development
CONFIDENT_BACKEND_US_PROD_URL US production backend URL
CONFIDENT_BACKEND_EU_PROD_URL EU production backend URL
CONFIDENT_BACKEND_AU_PROD_URL AU production backend URL
CONFIDENT_BACKEND_ON_PREM_URL On-prem backend URL (required when CONFIDENT_ENVIRONMENT=ON_PREM)

Available Tools

<details> <summary><strong>Prompts</strong> — 7 tools</summary>

Manage prompt templates with full version control — pull, push, version, and interpolate.

Tool Description
pull_prompt Fetch a prompt by alias, version, label, or commit hash
push_prompt Create or update a prompt template on Confident AI
interpolate_prompt Locally render a prompt template by replacing placeholders with values
create_prompt_version Assign a version string to a specific prompt commit
list_prompt_versions List all formal versions of a prompt
list_prompt_commits List the full commit history of a prompt
list_prompts List all prompts in your project

</details>

<details> <summary><strong>Datasets</strong> — 5 tools</summary>

Pull evaluation datasets for use in local test runs or agent workflows, with full version control to pin runs to immutable snapshots of goldens.

Tool Description
pull_dataset Fetch a dataset (single-turn or multi-turn) by alias, optionally pinned to a version
push_dataset Create or update datasets by adding new goldens, optionally onto a specific version
list_datasets List all datasets in your project
create_dataset_version Snapshot the current dataset state as a new immutable version
list_dataset_versions List all versions of a dataset (newest first)

</details>

<details> <summary><strong>Evaluate</strong> — 2 tools</summary>

Trigger cloud evaluations and simulate multi-turn conversations.

Tool Description
run_llm_evals Run cloud evaluations on a batch of test cases against a metric collection
simulate_conversation Simulate the next turn of a multi-turn conversation using a scenario and expected outcome

</details>

<details> <summary><strong>Traces, Threads, and Spans</strong> — 9 tools</summary>

Browse, inspect, and evaluate production observability data at every level of your LLM pipeline.

Tool Description
list_traces List traces with filtering by environment, time range, and sort order
get_trace Get full details of a specific trace, including all spans
list_threads List conversation threads with filtering and pagination
get_thread Get full details of a thread, including all traces and thread-level metrics
list_spans List spans with filtering by type, error state, prompt version, and more
get_span Get full details of a span, including I/O, cost, metrics, and annotations
evaluate_trace Trigger a cloud evaluation on a specific trace
evaluate_thread Trigger a cloud evaluation on a conversation thread
evaluate_span Trigger a cloud evaluation on a specific span

</details>

<details> <summary><strong>Annotations</strong> — 4 tools</summary>

Create and manage human feedback on traces, spans, and threads.

Tool Description
list_annotations List annotations with filtering by target, type, and rating range
get_annotation Get full details of a specific annotation
create_annotation Create a new annotation (thumbs rating or star rating) on a trace, span, or thread
update_annotation Update an existing annotation's rating, explanation, or expected output

</details>

<details> <summary><strong>Test Runs</strong> — 2 tools</summary>

Inspect past evaluation runs and their results.

Tool Description
list_test_runs List test runs with filtering by status, time range, and multi-turn type
get_test_run Get full details of a test run, including per-test-case metric scores and reasoning

</details>

<details> <summary><strong>Metric Collections</strong> — 1 tool</summary>

Discover available metric collections before triggering evaluations.

Tool Description
list_metric_collections List all metric collections, including their metrics and thresholds

</details>

Public Endpoint

[!CAUTION] The hosted /mcp endpoint is strictly for internal development and experimental use. It is not designed for public consumption. The API and its underlying data structures are unstable and subject to change, breaking updates, or removal at any time without prior notice. Do not build production applications or rely on this public endpoint for any critical workflows.

License

This project is licensed under the terms of the MIT License.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured