nsys-mcp

nsys-mcp

Enables GPU profiling and performance analysis via NVIDIA Nsight Systems, allowing agents to profile binaries and aggregate statistics for kernels, memory copies, and NVTX ranges. It supports advanced analysis through interval tree construction and structural queries on profiling reports.

Category
Visit Server

README

<p align="center"> <img src="assets/nvidia-nsight-systems-icon-gbp-shaded-256.png" alt="Nsight Systems logo" width="128"> </p>

<h3 align="center">nsys MCP Server</h3>

<p align="center"> <code>MCP</code> · <code>GPU Profiling</code> · <code>NVIDIA Nsight Systems</code> · <code>LLM Agents</code> </p>


nsys-mcp is an MCP (Model Context Protocol) server that provides GPU profiling capabilities through NVIDIA Nsight Systems (nsys). It lets an LLM agent profile binaries, parse reports, compute statistics, and analyze interval trees — all via standard MCP tool calls.

Prerequisites

Installation

pip install -e .

For development (tests):

pip install -e ".[dev]"

Running the Server

The server communicates over stdio (the default MCP transport):

python -m nsys_mcp.server

Cursor / VS Code MCP configuration

Add to your MCP settings (e.g. .cursor/mcp.json):

{
  "mcpServers": {
    "nsys-profiler": {
      "command": "python",
      "args": ["-m", "nsys_mcp.server"]
    }
  }
}

Available Tools

The server exposes 10 tools:

# Tool Description
1 check_nsys Verify that nsys is installed and return its version
2 profile_binary Profile a binary with full CUDA, NVTX, and GPU metrics collection
3 load_report Load a pre-existing .nsys-rep or NDJSON .json file
4 list_reports List all cached profiling reports with metadata
5 get_event_summary Breakdown of event types and counts for a report
6 get_kernel_stats Aggregate GPU kernel statistics grouped by kernel name
7 get_nvtx_stats Aggregate NVTX range durations grouped by annotation text
8 get_memcpy_stats Aggregate memory copy statistics grouped by direction
9 build_interval_tree Construct an interval tree from profiling events
10 query_interval_tree Run structural queries against an interval tree

profile_binary

Profile a binary with full CUDA, NVTX, and GPU metrics collection. Results are cached so repeated calls with the same arguments skip re-profiling.

Parameter Type Description
binary str Path to the executable
args list[str] Command-line arguments (optional)
env dict[str, str] Extra environment variables (optional)
cwd str Working directory (optional)
duration int Max profiling duration in seconds (optional)
extra_nsys_flags list[str] Additional nsys flags (optional)

Returns report_id, event_counts, and time_span_ns.

load_report

Load a pre-existing .nsys-rep or NDJSON .json file without re-profiling.

Parameter Type Description
path str Path to .nsys-rep or .json file

get_event_summary

Get a breakdown of event types and counts for a report.

Parameter Type Description
report_id str ID from profile_binary or load_report

get_kernel_stats

Aggregate GPU kernel statistics grouped by kernel name. Includes duration statistics (mean, std, min, max, median, count, total) and GPU metrics (grid/block size, shared memory, registers).

Parameter Type Description
report_id str Report identifier
top_n int Limit to top N kernels (optional)
sort_by str total_ns, count, mean_ns, or max_ns (default: total_ns)

get_nvtx_stats

Aggregate NVTX range durations grouped by annotation text.

Parameter Type Description
report_id str Report identifier
domain_id int Filter by NVTX domain (optional)

get_memcpy_stats

Aggregate memory copy statistics grouped by copy direction (HtoD, DtoH, DtoD, etc.). Includes duration stats, total bytes, and bandwidth estimates.

Parameter Type Description
report_id str Report identifier

build_interval_tree

Construct an interval tree from profiling events. If multiple disjoint trees exist (a forest), they can be merged under a synthetic root.

Parameter Type Description
report_id str Report identifier
event_types list[str] Subset of ["kernel", "nvtx", "trace", "memcpy", "sync"] (default: all)
reduce_forest bool Merge forest into single tree (default: true)
thread_id int Filter by thread/stream ID (optional)

query_interval_tree

Run structural queries against a previously built interval tree.

Parameter Type Description
report_id str Report identifier
query_type str One of the query types below
event_name str Event name for count_calls
subtree_root_name str Scope query to a named subtree (optional)
max_depth int Limit traversal depth (optional)

Query types:

Type Description
most_time_consuming Find the longest-duration event in a subtree
top_level List top-level interval names
count_calls Count occurrences of a named event in a subtree
subtree_summary Aggregated stats for a named subtree

Typical Workflow

1. check_nsys()                              — verify nsys is available
2. profile_binary(binary="/app/solver", ...) — profile and get report_id
3. get_kernel_stats(report_id, top_n=10)     — see top 10 kernels
4. get_nvtx_stats(report_id)                 — see NVTX annotation timings
5. get_memcpy_stats(report_id)               — see memory transfer stats
6. build_interval_tree(report_id)            — build the tree
7. query_interval_tree(report_id,            — find bottleneck
       query_type="most_time_consuming")
8. query_interval_tree(report_id,            — count specific kernel calls
       query_type="count_calls",
       event_name="cub::DeviceReduce")

Caching

Profiling results are cached in two tiers:

  • In-memory LRU — fast access for the current session (up to 8 reports).
  • Disk — persists across server restarts at ~/.nsys_mcp/cache/.

Cache keys are derived from the binary path and arguments, so identical profiling runs reuse cached results automatically.

Testing

pip install -e ".[dev]"
pytest

Project Structure

src/nsys_mcp/
├── server.py           # FastMCP server, tool definitions, lifespan
├── nsys_runner.py      # nsys CLI wrapper (profile, export, version)
├── report_parser.py    # NDJSON streaming parser, string-table resolution
├── models.py           # Pydantic models for events, stats, configs
├── aggregator.py       # Group-by aggregation (mean, std, min, max, count)
├── interval_tree.py    # Interval tree/forest construction + queries
└── cache.py            # Two-tier cache (memory LRU + disk pickle)

Links

License

nsys-mcp is licensed under the MIT License.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured