slurm_MCP
Enables interaction with Slurm HPC clusters via SSH, allowing job submission, monitoring, and error diagnosis through MCP clients like Claude Desktop.
README
Slurm HPC MCP Server
一个基于 Python 的 MCP Server,用于通过 SSH 连接 Slurm 超算集群,并向 Claude Desktop、Claude Code、Cursor、MCP Inspector 等客户端暴露可调用的集群能力。
当前版本已经完成真实链路验证:
- MCP Server 可正常启动
- Claude/MCP Inspector 可正常连接
- 可通过 SSH 连接远程 Slurm 登录节点
- 可提交真实
sbatch作业 - 可读取作业日志
- 可对常见 HPC 运行错误做结构化诊断
功能
当前提供以下 MCP 能力:
list_jobs查看当前作业队列list_partitions查看分区信息get_job_status查询指定作业状态、退出码、原因、日志路径submit_slurm_job上传并提交 Slurm 脚本diagnose_error对日志做结构化错误分类job_log://{job_id}读取作业标准输出日志
技术栈
- Python
- MCP Python SDK (
mcp[cli]) - Paramiko
- Slurm CLI (
squeue,sinfo,sacct,scontrol,sbatch)
目录结构
slurm-hpc-mcp/
├─ mcp_hpc_server.py
├─ README.md
├─ requirements.txt
├─ claude_desktop_config.example.json
├─ mcp.inspector.template.json
├─ examples/
│ └─ example_job.slurm
└─ tests/
└─ test_mcp_server.py
安装
建议使用独立虚拟环境。
pip install -r requirements.txt
配置方式
服务通过环境变量读取远程集群配置。
必填:
SLURM_SSH_HOSTSLURM_SSH_USERNAME
可选:
SLURM_SSH_PORT,默认22SLURM_SSH_KEY_PATHSLURM_SSH_PASSWORDSLURM_SSH_ALLOW_UNKNOWN_HOSTS,默认falseSLURM_REMOTE_WORKDIR,默认/tmp/mcp-slurmSLURM_CONNECT_TIMEOUT,默认15SLURM_COMMAND_TIMEOUT,默认60SLURM_LOG_MAX_BYTES,默认200000MCP_TRANSPORT,默认stdio
本地运行
默认按 stdio 方式运行,适合 Claude Desktop、Claude Code、Cursor、Inspector:
python .\mcp_hpc_server.py
如果要切换成 HTTP 传输:
$env:MCP_TRANSPORT="streamable-http"
python .\mcp_hpc_server.py
Claude Desktop 配置
请参考:
claude_desktop_config.example.json
示例:
{
"mcpServers": {
"slurm-hpc": {
"type": "stdio",
"command": "<your-python-executable>",
"args": ["<your-project-dir>/mcp_hpc_server.py"],
"env": {
"SLURM_SSH_HOST": "your-login-host",
"SLURM_SSH_PORT": "22",
"SLURM_SSH_USERNAME": "your-username",
"SLURM_SSH_KEY_PATH": "<your-ssh-private-key-path>",
"SLURM_SSH_ALLOW_UNKNOWN_HOSTS": "true"
}
}
}
}
Windows 下 Claude Desktop 常见配置位置:
%APPDATA%\Claude\claude_desktop_config.json
MCP Inspector
请参考:
mcp.inspector.template.json
如果你直接在 Inspector 里手工配置:
Command填 Python 解释器路径Arguments填mcp_hpc_server.py路径- 环境变量填
SLURM_SSH_*相关项
示例作业脚本
参考:
examples/example_job.slurm
提交时请根据你的集群实际情况修改:
- 分区名
- GRES / GPU / DCU 资源
- 时间限制
- 输出路径
工具说明
1. list_jobs
查看当前队列中的作业。
输入:
{
"user": "optional"
}
2. list_partitions
查看分区状态。
输入:
{}
3. get_job_status
查询指定作业状态。
输入:
{
"job_id": "37285107"
}
4. submit_slurm_job
提交 Slurm 脚本。
输入字段是 script_content,直接填多行脚本正文,不要额外包引号。
5. diagnose_error
对日志文本进行结构化故障分类。
输入:
{
"log_content": "ModuleNotFoundError: No module named 'mpi4py'"
}
6. job_log://{job_id}
通过资源模板读取作业标准输出日志。
安全机制
提交前会经过 ActionGuard 检查,默认拦截典型危险命令,例如:
rm -rf /mkfs- 对
/dev/*的直接写入 shutdown/rebootsudo
这只是第一层防护。生产环境仍建议:
- 使用低权限 SSH 账号
- 限制远程工作目录
- 保留操作日志
- 不要把私钥提交到仓库
测试
运行最小测试:
python -m unittest tests.test_mcp_server
当前最小测试覆盖:
- 脚本首行校验
- 批处理脚本规范化
- 错误分类逻辑
- 分区解析
- 作业状态解析
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.