mcp-slurm

mcp-slurm

Enables AI assistants to manage SLURM HPC clusters via SSH. Supports job submission, resource monitoring, queue management, and file operations.

Category
Visit Server

README

SLURM MCP Server

A Model Context Protocol (MCP) server for managing SLURM (Simple Linux Utility for Resource Management) clusters. This server allows AI assistants to interact with HPC clusters via SSH to submit jobs, check resources, manage queues, and monitor job status.

Features

  • Cluster Information: Query node status, partitions, and resource availability
  • Job Submission: Submit jobs with customizable parameters including resource requests
  • Job Management: Cancel, hold, release, suspend, resume, and modify running jobs
  • Script Upload: Upload and execute job scripts directly to the cluster
  • File Operations: View job outputs, list directories, and manage files
  • SSH Connectivity: Secure connection to login nodes with password or key authentication

Quick Start

1. Installation

# Clone the repository
git clone <your-repo>
cd mcp-slurm

# Install dependencies
npm install

# Build the project
npm run build

2. Configuration

Create a .env file in the project root with your cluster connection details:

# Required: Cluster connection details
SLURM_HOST=your-cluster-login-node.example.com
SLURM_USERNAME=your-username

# Authentication (choose one)
SLURM_PASSWORD=your-password
# OR
SLURM_SSH_KEY_PATH=/path/to/your/private/key

# Optional: Connection settings
SLURM_PORT=22

# Optional: Default SLURM parameters
SLURM_DEFAULT_PARTITION=compute
SLURM_DEFAULT_ACCOUNT=your-account

3. Running the Server

# Start the server
npm start

# Or run in development mode
npm run watch

The server will start on port 1337 by default.

Tools Available

1. slurm_info

Get cluster information including nodes, partitions, queues, and job accounting.

Parameters:

  • command_type: Type of command (sinfo, squeue, sacct, scontrol)
  • detailed: Get detailed output (optional)
  • partition: Query specific partition (optional)
  • node: Query specific node (optional)

Examples:

  • Check node status: {command_type: "sinfo", detailed: true}
  • View job queue: {command_type: "squeue"}
  • Check specific partition: {command_type: "sinfo", partition: "gpu"}

2. slurm_submit

Submit jobs to the SLURM scheduler with customizable parameters.

Parameters:

  • job_name: Name for the job
  • command: Command or script to execute
  • partition: Partition to submit to (optional)
  • nodes: Number of nodes (optional)
  • cpus_per_task: CPUs per task (optional)
  • memory: Memory per node (optional)
  • time_limit: Time limit (optional)
  • account: Account to charge (optional)
  • And many more...

Example:

{
  "job_name": "my_simulation",
  "command": "python simulate.py",
  "nodes": 2,
  "cpus_per_task": 16,
  "memory": "64G",
  "time_limit": "2:00:00",
  "partition": "compute"
}

3. slurm_job_control

Control SLURM jobs: cancel, hold, release, suspend, resume, requeue, or modify.

Parameters:

  • job_id: Job ID to control
  • action: Action to perform (cancel, hold, release, etc.)
  • reason: Reason for action (optional)
  • modify_parameter: Parameter to modify (for modify action)
  • modify_value: New value (for modify action)

Examples:

  • Cancel job: {job_id: "12345", action: "cancel", reason: "User request"}
  • Hold job: {job_id: "12345", action: "hold"}
  • Modify time limit: {job_id: "12345", action: "modify", modify_parameter: "TimeLimit", modify_value: "4:00:00"}

4. slurm_script

Upload job scripts and submit them to SLURM.

Parameters:

  • script_name: Name for the script file
  • script_content: Content of the job script
  • remote_path: Directory to store script (optional)
  • submit_immediately: Whether to submit after upload (default: true)
  • additional_sbatch_args: Extra sbatch arguments (optional)

Example:

{
  "script_name": "job.slurm",
  "script_content": "#!/bin/bash\n#SBATCH --job-name=test\n#SBATCH --time=1:00:00\n\necho 'Hello from SLURM!'",
  "submit_immediately": true
}

5. slurm_files

Manage files on the cluster including viewing job outputs.

Parameters:

  • action: Action to perform (list, view, tail, head, delete, find_outputs)
  • path: File or directory path (optional)
  • job_id: Job ID to find outputs for (optional)
  • lines: Number of lines to show (optional)
  • pattern: Search pattern (optional)

Examples:

  • List home directory: {action: "list"}
  • View job output: {action: "view", path: "slurm-12345.out"}
  • Find job outputs: {action: "find_outputs", job_id: "12345"}
  • Tail log file: {action: "tail", path: "job.log", lines: 100}

Claude Desktop Integration

Local Development

Add this to your Claude Desktop configuration:

Windows: %APPDATA%/Claude/claude_desktop_config.json macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "mcp-slurm": {
      "command": "node",
      "args": ["C:/Users/tejasv/Documents/mcp-slurm/dist/index.js"],
      "env": {
        "SLURM_HOST": "your-cluster.example.com",
        "SLURM_USERNAME": "your-username",
        "SLURM_PASSWORD": "your-password"
      }
    }
  }
}

After Publishing to npm

{
  "mcpServers": {
    "mcp-slurm": {
      "command": "npx",
      "args": ["mcp-slurm"],
      "env": {
        "SLURM_HOST": "your-cluster.example.com",
        "SLURM_USERNAME": "your-username",
        "SLURM_PASSWORD": "your-password"
      }
    }
  }
}

Security Considerations

  • Store sensitive credentials securely (use SSH keys when possible)
  • Limit the MCP server's access to specific user accounts
  • Consider using dedicated service accounts for automated operations
  • Review and audit job submissions regularly
  • Use network restrictions to limit access to trusted hosts

Common Use Cases

Checking Cluster Status

Ask Claude: "What's the current status of the cluster?" or "Show me available nodes in the GPU partition"

Submitting Jobs

Ask Claude: "Submit a job named 'data_analysis' that runs 'python analyze.py' using 4 CPUs and 16GB memory for 2 hours"

Monitoring Jobs

Ask Claude: "Show me all my running jobs" or "What's the status of job 12345?"

Managing Outputs

Ask Claude: "Show me the output of job 12345" or "Find all output files for my recent jobs"

Script Management

Ask Claude: "Upload and submit this job script: [paste script content]"

Troubleshooting

Connection Issues

  • Verify SSH connectivity: ssh username@hostname
  • Check firewall rules and network access
  • Ensure SSH key permissions are correct (600)

Authentication Errors

  • Verify username and password/key path
  • Check if two-factor authentication is required
  • Ensure the user has SLURM access

Job Submission Failures

  • Check if default partition/account are set correctly
  • Verify resource requests are within limits
  • Check SLURM configuration and policies

Development

To add new SLURM functionality:

  1. Create a new tool in src/tools/
  2. Extend the SlurmSSHClient if needed
  3. Build and test: npm run build && npm start

The framework automatically discovers and loads tools from the src/tools/ directory.

License

This project is licensed under the MIT License.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured