wpa-mcp

wpa-mcp

An MCP server that converts Windows WPR .etl performance traces into structured JSON summaries and flamegraph-ready data for LLM analysis. It bridges Windows Performance Analyzer automation with LLM reasoning capabilities for performance troubleshooting.

Category
Visit Server

README

wpa-mcp

An MCP (Model Context Protocol) server that turns Windows WPR .etl traces into structured, LLM-friendly performance insights — using WPAExporter + xperf under the hood, and optionally emitting flamegraph-ready folded stacks.

wpa-mcp bridges two worlds:

  • Windows Performance Analyzer (WPA) — the gold standard for analyzing ETW / WPR traces, but GUI-heavy and hard to automate.
  • LLMs (Claude, Copilot, GPT, …) — great at reasoning across evidence, but blind to .etl files.

This server exposes a small set of MCP tools so an LLM can:

  1. Validate a trace (does it actually contain the events needed for analysis?)
  2. Export the right WPA tables to CSV via predefined profiles
  3. Summarize the CSVs into a compact JSON (Top N processes, hot stacks, ready-thread latency, DPC/ISR offenders, UI jank)
  4. Render a Brendan-Gregg-style folded stack file for flamegraphs — or for the LLM to read directly

Table of contents


Architecture

+------------------+       stdio (MCP)        +--------------------+
|  LLM / MCP host  |  <-------------------->  |   wpa-mcp server   |
| (Claude, VSCode) |                          |  (this repo)       |
+------------------+                          +----------+---------+
                                                         |
                                             subprocess  |
                                                         v
                                   +---------------------+---------------------+
                                   |  xperf.exe          |  wpaexporter.exe    |
                                   |  (validate / stats) |  (+ .wpaProfile)    |
                                   +---------------------+---------------------+
                                                         |
                                                         v
                                              CSV tables (per profile)
                                                         |
                                                         v
                                       summarizer -> JSON  /  flamegraph -> .folded

Everything that the LLM sees is structured JSON or compact folded-stack text — never raw gigabyte CSVs.


Prerequisites

  • Windows 10/11 (required; the analysis tools are Windows-only)
  • Windows Performance Toolkit (WPT) installed (ships with Windows ADK / Windows SDK)
    • wpaexporter.exe
    • xperf.exe
  • Python 3.10+

If WPT is installed to a non-default path, set:

setx WPAEXPORTER_PATH \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\wpaexporter.exe\"
setx XPERF_PATH       \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\xperf.exe\"

Install

Via pipx (recommended — works once published to PyPI)

pipx install wpa-mcp
wpa-mcp    # starts the MCP stdio server

From source

git clone https://github.com/Jialong-zhong/wpr-xperf-mcp-server.git
cd wpr-xperf-mcp-server
pip install -e .
wpa-mcp

Capture a trace

The server's analyses are only as good as the providers you captured. Recommended capture for the four problem classes this server targets:

# Run as Administrator
wpr -start CPU ^
    -start GeneralProfile ^
    -start DesktopComposition ^
    -start Registry ^
    -filemode

# ... reproduce the issue ...

wpr -stop C:\traces\case01.etl \"repro notes here\"
WPR profile What it adds that wpa-mcp uses
CPU Sampled CPU, CSwitch, ReadyThread, StackWalk
GeneralProfile Processes, images, DPC/ISR
DesktopComposition DWM frame timing, Window-in-focus (UI hang evidence)
Registry Registry activity (optional; useful for startup/UI hangs)

If you skip CPU, the most valuable analyses (hot stacks, scheduling latency) won't work — validate_trace will tell you so.


MCP tools

Tool Purpose Typical caller
validate_trace(etl_path) Run xperf -a stats and report which providers / stacks exist LLM, always first
export_tables(etl_path, profile) Run one WPA profile via wpaexporter and return CSV paths Advanced / targeted
analyze_etl(etl_path, focus) Validate → export (by focus) → summarize. Returns one structured JSON LLM, default entry point
render_flamegraph(out_dir) Aggregate CPU Usage (Sampled) stacks into Brendan-Gregg folded format After analyze_etl with CPU focus

analyze_etl input schema

{
  \"etl_path\": \"C:\\traces\\case01.etl\",
  \"focus\": \"cpu | latency | ui | dpc_isr | all\",
  \"out_dir\": \"optional override\",
  \"top_n\": 20
}

analyze_etl output shape (abbreviated)

{
  \"etl\": \"C:\\traces\\case01.etl\",
  \"focus\": \"all\",
  \"validation\": {
    \"duration_sec\": 42.7,
    \"has_cpu_sampling\": true,
    \"has_cswitch\": true,
    \"has_readythread\": true,
    \"has_stacks\": true,
    \"has_dpc_isr\": true,
    \"has_dwm\": true,
    \"warnings\": []
  },
  \"exports\": [\"...\\cpu\\CPU Usage (Sampled)_...csv\", \"...\"],
  \"summary\": {
    \"cpu_top_processes\": [{\"process\": \"chrome.exe\", \"weight_ms\": 8421.3}],
    \"cpu_top_modules\":   [{\"module\":  \"ntdll.dll\",   \"weight_ms\": 2310.0}],
    \"cpu_hot_stacks\":    [{\"stack\":   \"ntdll!... ; app!hot_fn\", \"weight_ms\": 1240.0}],
    \"ready_latency_top\": [{\"process\": \"explorer.exe\", \"tid\": 1234, \"p95_ms\": 187.0}],
    \"dpc_isr_top\":       [{\"driver\":  \"ndis.sys\",     \"total_ms\": 95.2, \"max_us\": 820}],
    \"ui_focus_top\":      [{\"process\": \"myapp.exe\",    \"focus_ms\": 5400.0}],
    \"dwm_slow_frames\":   {\"count\": 38, \"p95_ms\": 41.7, \"max_ms\": 128.0}
  }
}

Built-in WPA profiles

Each profile is a .wpaProfile XML that tells wpaexporter which WPA tables + columns to dump.

Focus key File Tables exported
cpu wpa/profiles/cpu_hotpath.wpaProfile CPU Usage (Sampled)
latency wpa/profiles/scheduling_latency.wpaProfile CPU Usage (Precise), Ready Thread
ui wpa/profiles/ui_hang.wpaProfile Window In Focus, DWM Frame Details
dpc_isr wpa/profiles/dpc_isr.wpaProfile DPC/ISR Duration

Column sets are deliberately minimal to keep CSVs small and summarizer-friendly.


Analysis examples

These are end-to-end, copy-pasteable walkthroughs. Each shows the user prompt, the tool calls the LLM should make, the JSON shape you can expect, and the conclusions a well-prompted LLM should draw.

Example 1: Runaway CPU

User: "C:\traces\cpu_spike.etl — some process is pinning my CPU at 100%. Find it and tell me which function."

LLM tool calls:

// 1) validate
validate_trace({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\" })

// 2) analyze focused on CPU
analyze_etl({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\", \"focus\": \"cpu\", \"top_n\": 10 })

// 3) (optional) generate folded stacks for the hottest paths
render_flamegraph({
  \"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
  \"top_n\": 200,
  \"min_weight_ms\": 5
})

Typical summary returned:

{
  \"cpu_top_processes\": [
    { \"process\": \"myapp.exe\",  \"weight_ms\": 38210.0 },
    { \"process\": \"chrome.exe\", \"weight_ms\":  2100.0 }
  ],
  \"cpu_top_modules\": [
    { \"module\": \"myapp.exe\",  \"weight_ms\": 31020.0 },
    { \"module\": \"ucrtbase.dll\", \"weight_ms\": 4100.0 }
  ],
  \"cpu_hot_stacks\": [
    { \"stack\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk\",
      \"weight_ms\": 27800.0 }
  ]
}

What the LLM should conclude:

  • Trace duration 42 s, myapp.exe consumed ~38 s of CPU ≈ ~90% of wall time on a single core equivalent.
  • Nearly all weight is in myapp!compress_chunk via worker_loop → this is a hot loop in the app itself, not the runtime or OS.
  • Recommend: inspect compress_chunk, consider SIMD / chunking, and capture a second trace with -start CPU -start FileIO to see if the loop is also I/O-bound.

Example 2: UI hang / "not responding"

User: "When the user clicks Save, the window freezes for ~2 seconds. Trace at C:\traces\save_hang.etl."

LLM tool calls:

validate_trace({ \"etl_path\": \"C:\\traces\\save_hang.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"ui\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"latency\" })

Expected signals:

{
  \"ui_focus_top\": [{ \"process\": \"myapp.exe\", \"focus_ms\": 5400 }],
  \"dwm_slow_frames\": { \"count\": 42, \"p95_ms\": 58.0, \"max_ms\": 1980.0 },
  \"ready_latency_top\": [
    { \"process\": \"myapp.exe\", \"tid\": 7321, \"p95_ms\": 14.0 }
  ]
}

Reasoning rubric the LLM should apply:

  • dwm_slow_frames.max_ms ≈ 1980 while the app is in focus → the hang is real and UI-thread-bound.
  • ready_latency_top for the UI thread is low (≈14 ms p95) → the thread wasn't starved for CPU; it was busy on its own work (otherwise we'd see hundreds of ms of ready-wait).
  • Therefore the UI thread is doing synchronous work during Save. Likely culprits: synchronous disk/network I/O, lock contention, or a heavy compute path on the UI thread.
  • Next step: re-capture with -start CPU -start FileIO -start DiskIO and focus on the UI thread's stacks with render_flamegraph to see exactly what it's doing during those 2 seconds.

Contrast: if ready_latency_top had been, say, p95_ms: 900 — then the UI thread was ready but not scheduled. That's a scheduling/contention story, not a "heavy work on UI thread" story.


Example 3: Audio/mouse glitch caused by a driver

User: "Audio crackles and the mouse skips for a split second every few minutes. C:\traces\glitch.etl."

LLM tool calls:

validate_trace({ \"etl_path\": \"C:\\traces\\glitch.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\glitch.etl\", \"focus\": \"dpc_isr\" })

Expected signal:

{
  \"dpc_isr_top\": [
    { \"driver\": \"Netwtw10.sys\", \"total_ms\": 312.4, \"max_us\": 4120, \"count\": 1820 },
    { \"driver\": \"ndis.sys\",      \"total_ms\":  95.1, \"max_us\":  820, \"count\": 4300 },
    { \"driver\": \"nvlddmkm.sys\",  \"total_ms\":  60.0, \"max_us\":  410, \"count\": 2100 }
  ]
}

What the LLM should conclude:

  • Netwtw10.sys (Intel Wi-Fi driver) has a single DPC over 4 ms — that's well above the ~1 ms "don't cause audio glitches" rule of thumb.
  • Correlation with symptom: Wi-Fi DPC storms typically line up with mouse/audio skips because DPCs run at elevated IRQL and block the audio/HID stack.
  • Recommend: update the Wi-Fi driver; if the problem persists, disable power-saving for the Wi-Fi adapter and re-capture.

Quality rules wpa-mcp's prompting guide bakes in: any driver with max_us > 1000 is suspicious, >= 500 worth mentioning.


Example 4: Feeding folded stacks to the LLM

After analyze_etl with focus="cpu", you can ask the LLM to drill deeper:

render_flamegraph({
  \"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
  \"output_path\": \"C:\\traces\\cpu_spike.folded\",
  \"top_n\": 300,
  \"min_weight_ms\": 2
})

Returns:

{
  \"folded_file\": \"C:\\traces\\cpu_spike.folded\",
  \"source_csv\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\\CPU Usage (Sampled)_....csv\",
  \"line_count\": 287,
  \"total_weight_ms\": 39120.0,
  \"preview\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk 27800\\nntdll!... ; myapp!parse_header 410\\n...\"
}

You can now either:

  • Render an SVG flamegraph (requires Perl + Brendan Gregg's script):

    flamegraph.pl C:\traces\cpu_spike.folded > C:\traces\cpu_spike.svg
    
  • Or just let the LLM read the preview — the folded format is already much easier for an LLM than raw CSV.


Client configuration

Claude Desktop — %APPDATA%\\Claude\\claude_desktop_config.json

{
  \"mcpServers\": {
    \"wpa\": {
      \"command\": \"wpa-mcp\",
      \"env\": {
        \"WPAEXPORTER_PATH\": \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/wpaexporter.exe\",
        \"XPERF_PATH\":       \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/xperf.exe\"
      }
    }
  }
}

VS Code (GitHub Copilot Chat / MCP) — .vscode/mcp.json

Already included in this repo. It points at server.py in the workspace.

Custom MCP host

Any MCP client that speaks stdio works. Launch wpa-mcp (or python server.py) as a child process and send tools/list + tools/call over stdio.


Release process

This repo publishes to PyPI via GitHub Actions + PyPI trusted publishing (OIDC) — no secrets required.

One-time PyPI setup:

  1. Claim the wpa-mcp project on PyPI.
  2. Add a Trusted Publisher:
    • Owner: Jialong-zhong
    • Repository: wpr-xperf-mcp-server
    • Workflow: publish.yml
    • Environment: pypi

Then, to ship a new version:

# bump version in pyproject.toml, commit, then:
git tag v0.2.0
git push origin v0.2.0

The Publish to PyPI workflow (on tag v*) will build the sdist + wheel and publish automatically.


Troubleshooting

Symptom Likely cause Fix
wpaexporter not found WPT not installed or path wrong Install Windows Performance Toolkit; set WPAEXPORTER_PATH
xperf stats failed ETL corrupted or not a WPR trace Re-capture; ensure you ran wpr -stop <file> successfully
columns missing in summarizer Your WPA version renamed columns Open the corresponding .wpaProfile and adjust <Column Name=...> to match your WPA
has_stacks: false in validation -start CPU not used during capture, or no admin Re-capture with -start CPU as Administrator
Empty dwm_slow_frames DesktopComposition profile wasn't enabled Re-capture with -start DesktopComposition
ready_latency_top all near zero during a hang The thread isn't ready-waiting → it's doing work Run render_flamegraph on CPU exports to see what work

FAQ

Q: Does this need WPA GUI installed? No. Only wpaexporter.exe and xperf.exe (both from the Windows Performance Toolkit) are called. WPA GUI never launches.

Q: Can I use this on Linux/macOS? The MCP server itself is pure Python. But wpaexporter / xperf only exist on Windows, so analysis must run on Windows. A common setup is: capture on Windows, copy ETL to a Windows analysis box, run wpa-mcp there.

Q: Why not parse ETL directly in Python? ETL parsing is deep. Microsoft already ships an excellent, correct parser (wpaexporter) that understands every kernel + provider schema. Reusing it is cheaper and more accurate than reimplementing.

Q: Can I add my own WPA profile? Yes. Drop a .wpaProfile into wpa/profiles/, add a key to PROFILE_MAP in server.py, and (optionally) a summarizer in wpa/summarizer.py.

Q: Does the LLM see the full CSV? No — by design. The LLM sees compact summary JSON plus (optionally) folded-stack text. Raw CSVs stay on disk and are referenced by path.


License

MIT. See LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured