MCP Servers

nsys-mcp

Enables GPU profiling and performance analysis via NVIDIA Nsight Systems, allowing agents to profile binaries and aggregate statistics for kernels, memory copies, and NVTX ranges. It supports advanced analysis through interval tree construction and structural queries on profiling reports.

README

<h3 align="center">nsys MCP Server</h3>

<p align="center"> <code>MCP</code> · <code>GPU Profiling</code> · <code>NVIDIA Nsight Systems</code> · <code>LLM Agents</code> </p>

nsys-mcp is an MCP (Model Context Protocol) server that provides GPU profiling capabilities through NVIDIA Nsight Systems (nsys). It lets an LLM agent profile binaries, parse reports, compute statistics, and analyze interval trees — all via standard MCP tool calls.

Prerequisites

Python 3.10+
NVIDIA Nsight Systems (nsys) installed and available in PATH. Download from the Nsight Systems page. See the Nsight Systems documentation for setup details.

Installation

pip install -e .

For development (tests):

pip install -e ".[dev]"

Running the Server

The server communicates over stdio (the default MCP transport):

python -m nsys_mcp.server

Cursor / VS Code MCP configuration

Add to your MCP settings (e.g. .cursor/mcp.json):

{
  "mcpServers": {
    "nsys-profiler": {
      "command": "python",
      "args": ["-m", "nsys_mcp.server"]
    }
  }
}

Available Tools

The server exposes 10 tools:

#	Tool	Description
1	`check_nsys`	Verify that `nsys` is installed and return its version
2	`profile_binary`	Profile a binary with full CUDA, NVTX, and GPU metrics collection
3	`load_report`	Load a pre-existing `.nsys-rep` or NDJSON `.json` file
4	`list_reports`	List all cached profiling reports with metadata
5	`get_event_summary`	Breakdown of event types and counts for a report
6	`get_kernel_stats`	Aggregate GPU kernel statistics grouped by kernel name
7	`get_nvtx_stats`	Aggregate NVTX range durations grouped by annotation text
8	`get_memcpy_stats`	Aggregate memory copy statistics grouped by direction
9	`build_interval_tree`	Construct an interval tree from profiling events
10	`query_interval_tree`	Run structural queries against an interval tree

`profile_binary`

Profile a binary with full CUDA, NVTX, and GPU metrics collection. Results are cached so repeated calls with the same arguments skip re-profiling.

Parameter	Type	Description
`binary`	`str`	Path to the executable
`args`	`list[str]`	Command-line arguments (optional)
`env`	`dict[str, str]`	Extra environment variables (optional)
`cwd`	`str`	Working directory (optional)
`duration`	`int`	Max profiling duration in seconds (optional)
`extra_nsys_flags`	`list[str]`	Additional nsys flags (optional)

Returns report_id, event_counts, and time_span_ns.

`load_report`

Load a pre-existing .nsys-rep or NDJSON .json file without re-profiling.

Parameter	Type	Description
`path`	`str`	Path to `.nsys-rep` or `.json` file

`get_event_summary`

Get a breakdown of event types and counts for a report.

Parameter	Type	Description
`report_id`	`str`	ID from `profile_binary` or `load_report`

`get_kernel_stats`

Aggregate GPU kernel statistics grouped by kernel name. Includes duration statistics (mean, std, min, max, median, count, total) and GPU metrics (grid/block size, shared memory, registers).

Parameter	Type	Description
`report_id`	`str`	Report identifier
`top_n`	`int`	Limit to top N kernels (optional)
`sort_by`	`str`	`total_ns`, `count`, `mean_ns`, or `max_ns` (default: `total_ns`)

`get_nvtx_stats`

Aggregate NVTX range durations grouped by annotation text.

Parameter	Type	Description
`report_id`	`str`	Report identifier
`domain_id`	`int`	Filter by NVTX domain (optional)

`get_memcpy_stats`

Aggregate memory copy statistics grouped by copy direction (HtoD, DtoH, DtoD, etc.). Includes duration stats, total bytes, and bandwidth estimates.

Parameter	Type	Description
`report_id`	`str`	Report identifier

`build_interval_tree`

Construct an interval tree from profiling events. If multiple disjoint trees exist (a forest), they can be merged under a synthetic root.

Parameter	Type	Description
`report_id`	`str`	Report identifier
`event_types`	`list[str]`	Subset of `["kernel", "nvtx", "trace", "memcpy", "sync"]` (default: all)
`reduce_forest`	`bool`	Merge forest into single tree (default: `true`)
`thread_id`	`int`	Filter by thread/stream ID (optional)

`query_interval_tree`

Run structural queries against a previously built interval tree.

Parameter	Type	Description
`report_id`	`str`	Report identifier
`query_type`	`str`	One of the query types below
`event_name`	`str`	Event name for `count_calls`
`subtree_root_name`	`str`	Scope query to a named subtree (optional)
`max_depth`	`int`	Limit traversal depth (optional)

Query types:

Type	Description
`most_time_consuming`	Find the longest-duration event in a subtree
`top_level`	List top-level interval names
`count_calls`	Count occurrences of a named event in a subtree
`subtree_summary`	Aggregated stats for a named subtree

Typical Workflow

1. check_nsys()                              — verify nsys is available
2. profile_binary(binary="/app/solver", ...) — profile and get report_id
3. get_kernel_stats(report_id, top_n=10)     — see top 10 kernels
4. get_nvtx_stats(report_id)                 — see NVTX annotation timings
5. get_memcpy_stats(report_id)               — see memory transfer stats
6. build_interval_tree(report_id)            — build the tree
7. query_interval_tree(report_id,            — find bottleneck
       query_type="most_time_consuming")
8. query_interval_tree(report_id,            — count specific kernel calls
       query_type="count_calls",
       event_name="cub::DeviceReduce")

Caching

Profiling results are cached in two tiers:

In-memory LRU — fast access for the current session (up to 8 reports).
Disk — persists across server restarts at ~/.nsys_mcp/cache/.

Cache keys are derived from the binary path and arguments, so identical profiling runs reuse cached results automatically.

Testing

pip install -e ".[dev]"
pytest

Project Structure

src/nsys_mcp/
├── server.py           # FastMCP server, tool definitions, lifespan
├── nsys_runner.py      # nsys CLI wrapper (profile, export, version)
├── report_parser.py    # NDJSON streaming parser, string-table resolution
├── models.py           # Pydantic models for events, stats, configs
├── aggregator.py       # Group-by aggregation (mean, std, min, max, count)
├── interval_tree.py    # Interval tree/forest construction + queries
└── cache.py            # Two-tier cache (memory LRU + disk pickle)

License

nsys-mcp is licensed under the MIT License.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

nsys-mcp

README

Prerequisites

Installation

Running the Server

Cursor / VS Code MCP configuration

Available Tools

profile_binary

load_report

get_event_summary

get_kernel_stats

get_nvtx_stats

get_memcpy_stats

build_interval_tree

query_interval_tree

Typical Workflow

Caching

Testing

Project Structure

Links

License

Recommended Servers

`profile_binary`

`load_report`

`get_event_summary`

`get_kernel_stats`

`get_nvtx_stats`

`get_memcpy_stats`

`build_interval_tree`

`query_interval_tree`