agent-trace-intelligence

agent-trace-intelligence

Analyzes AI agent traces to diagnose failures, score performance, and provide actionable fixes using tools like judge_trace and trace_breakdown.

Category
Visit Server

README

Agent Trace Intelligence MCP

Python 3.11+ MCP PyPI License MIT CI

Diagnose why your agent did what it did, and how to fix it.

The Problem

When an AI agent fails or behaves unexpectedly, existing tools tell you what happened: token counts, step logs, latency metrics. But they don't tell you why the agent made a wrong turn or how to fix it. Debugging agents means staring at raw traces and guessing.

The Solution

Agent traces show you what happened. This tool tells you why it went wrong and what to change. Pass in any agent trace JSON and get root causes, scores, and a concrete fix back. Zero instrumentation required.

Use it alongside LangSmith, Arize Phoenix, and W&B Weave. When your observability stack surfaces a failure, this is where you go to diagnose it.

Works in Cursor, Claude Desktop, VS Code (Copilot MCP), and any stdio MCP client.


How It Fits

This tool explains why a single agent trace behaved the way it did.

It complements existing observability tools:

  • Azure Application Insights, AWS CloudWatch, GCP Cloud Trace: show what happened across runs
  • LangSmith, Arize Phoenix, W&B Weave: track agent behaviour over time

This tool answers a narrower question: why did this specific trace fail, and what exactly needs to change?


Install

# With uv (recommended)
uv add agent-trace-intelligence

# Or pip
pip install agent-trace-intelligence

Quick Start: Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "agent-trace-intelligence": {
      "command": "uv",
      "args": ["run", "agent-trace-intelligence"],
      "env": {
        "AZURE_AI_API_KEY": "your-azure-ai-foundry-key",
        "AZURE_AI_API_BASE": "https://your-resource.cognitiveservices.azure.com/",
        "JUDGE_MODEL": "azure_ai/claude-opus-4-6"
      }
    }
  }
}

No Azure? Use OpenAI instead:

{
  "env": {
    "JUDGE_MODEL": "gpt-4o-mini",
    "OPENAI_API_KEY": "sk-..."
  }
}

Tools Reference

Tool Description API Key? Speed
judge_trace Root cause analysis, 4-dimension scoring, grade, verdict, plain-English explanation Required ~3-5s
trace_breakdown Step-by-step scoring with flags (REDUNDANT_TOOL_CALL, REASONING_GAP, etc.) Required ~3-5s
efficiency_score Deterministic token/latency/redundancy analysis Not required Instant

judge_trace

Input:

{
  "trace": "<JSON string of AgentTrace>",
  "goal": "optional: override the goal stated in the trace"
}

Output:

{
  "overall_score": 0.82,
  "grade": "B",
  "verdict": "needs optimisation",
  "dimension_scores": {
    "goal_completion": 0.9,
    "reasoning_clarity": 0.8,
    "tool_usage": 0.75,
    "output_quality": 0.83
  },
  "summary": "Agent completed the goal but made one redundant search call",
  "root_causes": [
    "Unnecessary second search call at step 4 caused token inflation. Result was already available from step 2",
    "Agent did not validate tool output before proceeding to the next step"
  ],
  "strengths": ["Clear reasoning steps", "Correct initial tool selection"],
  "weaknesses": ["Redundant tool call on step 4", "Incomplete final output"],
  "recommendation": "Remove duplicate search call at step 4. Saves ~400 tokens",
  "explain_like_im_5": "The agent searched the internet twice for the same thing when it only needed to do it once, which wasted time and money.",
  "confidence": "high"
}

Verdict values: "production-ready" | "needs optimisation" | "broken"

trace_breakdown

Per-step scoring with flags:

  • REDUNDANT_TOOL_CALL: same tool called with same/similar input
  • HALLUCINATED_TOOL: tool referenced that doesn't exist in the trace
  • REASONING_GAP: response doesn't follow from previous tool output
  • GOAL_DRIFT: agent deviates from the original goal
  • PREMATURE_STOP: agent stopped before completing the goal

efficiency_score

No API key required. Deterministic analysis of:

  • Token usage (total, per-step, rating: good/moderate/high)
  • Tool redundancy (redundant calls, failed calls, redundancy rate)
  • Latency (total ms, slowest step/tool, rating: fast/acceptable/slow)
  • overall_efficiency_score (0.0-1.0 weighted composite)

AgentTrace Schema

All tools accept a trace JSON conforming to this schema:

{
  "trace_id": "optional",
  "agent_name": "optional",
  "goal": "What was the agent trying to do?",
  "model": "gpt-4o",
  "total_tokens": 820,
  "total_latency_ms": 3200,
  "final_output": "The agent's final response",
  "steps": [
    {
      "step_number": 1,
      "role": "user",
      "content": "User message"
    },
    {
      "step_number": 2,
      "role": "assistant",
      "content": "I'll search for that.",
      "token_count": 120
    },
    {
      "step_number": 3,
      "role": "tool",
      "tool_call": {
        "tool_name": "web_search",
        "input": {"query": "..."},
        "output": "Search results...",
        "latency_ms": 1200,
        "error": null
      }
    }
  ]
}

All fields except steps are optional. Works with whatever you can provide.


Format Adapters

Optional helpers to convert native framework traces to AgentTrace format:

from agent_trace_intelligence.formats import (
    adapt_langchain,      # LangChain callback handler output
    adapt_openai_agents,  # OpenAI Agents SDK RunStep objects
    adapt_autogen,        # AutoGen message history
    adapt_maf,            # MAF GA 1.0 (OpenTelemetry GenAI spans)
)

# Convert and pass directly to any tool
trace = adapt_langchain(raw_langchain_output)

These are convenience helpers. The tools accept any valid AgentTrace JSON regardless of framework.

Adapter Framework Status
adapt_langchain LangChain callback handler / LangSmith export v1
adapt_openai_agents OpenAI Agents SDK (RunStep objects) v1
adapt_autogen AutoGen legacy (pyautogen) message history v1
adapt_maf Microsoft Agent Framework GA 1.0 (OTel spans) v1

Model Support & Cost Guidance

Configure via JUDGE_MODEL env var. Zero code change required.

Use Case Recommended Model Cost
Best quality (default) azure_ai/claude-opus-4-6 ~$0.015/trace
Fast Azure alternative azure_ai/gpt-4.1 ~$0.008/trace
Open source / no Azure gpt-4o-mini ~$0.002/trace
CI/CD batch evaluation gpt-4o-mini < $0.01/trace
Anthropic direct claude-haiku-4-5-20251001 ~$0.001/trace

For CI/CD use: Set JUDGE_MODEL=gpt-4o-mini to keep costs under $0.01 per trace. For interactive debugging, azure_ai/claude-opus-4-6 gives the best root cause reasoning.

efficiency_score is always free. No model call, no API key.


Future Direction (v2)

  • Pattern detection across multiple traces to surface recurring failure modes
  • Batch trace analysis for CI/CD quality gates
  • Enterprise governance signals to flag traces that violate defined agent policies
  • SSE transport for enterprise internal MCP deployment
  • Connectors to pull traces directly from observability platforms (Azure App Insights, AWS CloudWatch, GCP Cloud Trace, LangSmith). Contributions welcome.

License

MIT. See LICENSE

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured