slurm_MCP

slurm_MCP

Enables interaction with Slurm HPC clusters via SSH, allowing job submission, monitoring, and error diagnosis through MCP clients like Claude Desktop.

Category
Visit Server

README

Slurm HPC MCP Server

一个基于 Python 的 MCP Server,用于通过 SSH 连接 Slurm 超算集群,并向 Claude Desktop、Claude Code、Cursor、MCP Inspector 等客户端暴露可调用的集群能力。

当前版本已经完成真实链路验证:

  • MCP Server 可正常启动
  • Claude/MCP Inspector 可正常连接
  • 可通过 SSH 连接远程 Slurm 登录节点
  • 可提交真实 sbatch 作业
  • 可读取作业日志
  • 可对常见 HPC 运行错误做结构化诊断

功能

当前提供以下 MCP 能力:

  • list_jobs 查看当前作业队列
  • list_partitions 查看分区信息
  • get_job_status 查询指定作业状态、退出码、原因、日志路径
  • submit_slurm_job 上传并提交 Slurm 脚本
  • diagnose_error 对日志做结构化错误分类
  • job_log://{job_id} 读取作业标准输出日志

技术栈

  • Python
  • MCP Python SDK (mcp[cli])
  • Paramiko
  • Slurm CLI (squeue, sinfo, sacct, scontrol, sbatch)

目录结构

slurm-hpc-mcp/
├─ mcp_hpc_server.py
├─ README.md
├─ requirements.txt
├─ claude_desktop_config.example.json
├─ mcp.inspector.template.json
├─ examples/
│  └─ example_job.slurm
└─ tests/
   └─ test_mcp_server.py

安装

建议使用独立虚拟环境。

pip install -r requirements.txt

配置方式

服务通过环境变量读取远程集群配置。

必填:

  • SLURM_SSH_HOST
  • SLURM_SSH_USERNAME

可选:

  • SLURM_SSH_PORT,默认 22
  • SLURM_SSH_KEY_PATH
  • SLURM_SSH_PASSWORD
  • SLURM_SSH_ALLOW_UNKNOWN_HOSTS,默认 false
  • SLURM_REMOTE_WORKDIR,默认 /tmp/mcp-slurm
  • SLURM_CONNECT_TIMEOUT,默认 15
  • SLURM_COMMAND_TIMEOUT,默认 60
  • SLURM_LOG_MAX_BYTES,默认 200000
  • MCP_TRANSPORT,默认 stdio

本地运行

默认按 stdio 方式运行,适合 Claude Desktop、Claude Code、Cursor、Inspector:

python .\mcp_hpc_server.py

如果要切换成 HTTP 传输:

$env:MCP_TRANSPORT="streamable-http"
python .\mcp_hpc_server.py

Claude Desktop 配置

请参考:

  • claude_desktop_config.example.json

示例:

{
    "mcpServers": {
      "slurm-hpc": {
        "type": "stdio",
        "command": "<your-python-executable>",
        "args": ["<your-project-dir>/mcp_hpc_server.py"],
        "env": {
        "SLURM_SSH_HOST": "your-login-host",
        "SLURM_SSH_PORT": "22",
        "SLURM_SSH_USERNAME": "your-username",
        "SLURM_SSH_KEY_PATH": "<your-ssh-private-key-path>",
        "SLURM_SSH_ALLOW_UNKNOWN_HOSTS": "true"
      }
    }
  }
}

Windows 下 Claude Desktop 常见配置位置:

%APPDATA%\Claude\claude_desktop_config.json

MCP Inspector

请参考:

  • mcp.inspector.template.json

如果你直接在 Inspector 里手工配置:

  • Command 填 Python 解释器路径
  • Argumentsmcp_hpc_server.py 路径
  • 环境变量填 SLURM_SSH_* 相关项

示例作业脚本

参考:

  • examples/example_job.slurm

提交时请根据你的集群实际情况修改:

  • 分区名
  • GRES / GPU / DCU 资源
  • 时间限制
  • 输出路径

工具说明

1. list_jobs

查看当前队列中的作业。

输入:

{
  "user": "optional"
}

2. list_partitions

查看分区状态。

输入:

{}

3. get_job_status

查询指定作业状态。

输入:

{
  "job_id": "37285107"
}

4. submit_slurm_job

提交 Slurm 脚本。

输入字段是 script_content,直接填多行脚本正文,不要额外包引号。

5. diagnose_error

对日志文本进行结构化故障分类。

输入:

{
  "log_content": "ModuleNotFoundError: No module named 'mpi4py'"
}

6. job_log://{job_id}

通过资源模板读取作业标准输出日志。

安全机制

提交前会经过 ActionGuard 检查,默认拦截典型危险命令,例如:

  • rm -rf /
  • mkfs
  • /dev/* 的直接写入
  • shutdown / reboot
  • sudo

这只是第一层防护。生产环境仍建议:

  • 使用低权限 SSH 账号
  • 限制远程工作目录
  • 保留操作日志
  • 不要把私钥提交到仓库

测试

运行最小测试:

python -m unittest tests.test_mcp_server

当前最小测试覆盖:

  • 脚本首行校验
  • 批处理脚本规范化
  • 错误分类逻辑
  • 分区解析
  • 作业状态解析

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured