RISBridge MCP
Enables users to run Slurm jobs on WashU RIS Compute2 cluster by describing their intent in plain English, without requiring SSH or Slurm knowledge.
README
RISBridge MCP
Run jobs on the WashU RIS Compute2 cluster by describing what you want. An intent-to-compute server that turns plain English into safe, validated Slurm work — no SSH,
sbatch, partitions, GRES, modules, or storage layout to learn.
RISBridge is a Model Context Protocol server. You talk to it in plain language through an MCP client (Claude Desktop or Claude Code); it plans the job, shows you exactly what it will submit, and runs it on the cluster only after you confirm. It is deliberately not a terminal — there is no arbitrary-shell tool. Every action is a specific, schema-validated operation, which is what makes it safe to let an agent drive real HPC work.
Why it exists
Getting research code onto an HPC cluster usually means learning SSH, Duo, sbatch, partitions,
GRES, modules, and a storage layout — before running a single line. RISBridge removes that wall:
- Researchers new to HPC get a guided path — a setup wizard, plain-English planning, dry-run previews, and clear status/log/diagnose tools. No Slurm knowledge required.
- Experienced HPC users get speed and consistency — job arrays,
afterokpipelines, single-node multi-GPUtorchrun, vLLM serving, efficiency right-sizing, run comparison, and reproducible history.
Same server, both audiences. The tools scale from "hold my hand" to "give me the template and get out of the way."
How it works
flowchart LR
A([Plain-English intent]) --> B[Plan & validate<br/>partition · resources · paths]
B --> C{Dry-run preview<br/>“I will submit: …”}
C -- confirm --> D[Build sbatch from<br/>fixed, validated templates]
D --> E[[SSH · Duo / multiplexing]]
E --> F[(Slurm on Compute2)]
F --> G[Monitor · logs · explain<br/>efficiency · history]
G --> A
Login nodes are used only for submitting — every workload runs through Slurm on compute nodes. Submissions are dry-run by default and require an explicit confirm before anything reaches the scheduler.
Highlights
- Intent → compute. "Run
src/train.pyon a GPU for 4 hours" becomes a validatedsbatchsubmission. "Why is my job pending?" returns a plain-English answer with a fix. - Safe by construction. No arbitrary-shell tool; strict input validation; remote commands run via argument arrays (never shell strings); file uploads are base64-encoded; paths are workspace-confined; job control verifies ownership; a redacted audit log records every action.
- Auth that respects policy. SSH-key setup, Duo-aware connection multiplexing (approve once, reuse for hours), and an optional per-user worker that orchestrates Slurm from inside the cluster. Duo is never automated; private keys are never shown or logged.
- Broad workload coverage. Python (CPU/GPU), R, notebooks, GPU smoke tests, job arrays, multi-GPU training, vLLM serving, JupyterLab, conda environments, dependency pipelines, and long checkpoint/requeue runs.
- 53 high-level tools across auth, discovery, projects/files, planning, submission, job management, logs, environments, expert controls, and the per-user worker.
Example
You type intent; RISBridge does the rest:
"What's my RISBridge setup status?" "Set up my SSH key." → installs the key, one Duo approval "Discover my profile." → finds your Slurm account and storage workspace "Run
src/train.pyon one H100 for 4 hours." → preview → confirm → job ID "Why is job 1234567 pending?" · "Tail its logs." · "Right-size my last 5 runs."
The toolset
<details> <summary><b>All 53 tools, by category</b></summary>
Setup & authentication — ris_setup_wizard, ris_auth_status, ris_setup_ssh_key,
ris_show_public_key, ris_generate_ssh_config, ris_write_ssh_config,
ris_test_key_only_auth, ris_open_ssh_master, ris_check_ssh_master, ris_repair_stale_socket
Discovery & configuration — ris_discover_profile, ris_set_profile, ris_show_config,
ris_validate_config, ris_list_partitions, ris_gpu_status
Projects, files & environments — ris_create_project, ris_upload_file,
ris_list_project_files, ris_ensure_env, ris_list_modules, ris_list_conda_envs,
ris_inspect_conda_env
Planning — ris_plan_run, ris_researcher_wizard, ris_estimate_resources
Submitting jobs — ris_run, ris_submit_python_job, ris_submit_r_job,
ris_submit_notebook_job, ris_submit_gpu_smoke_test, ris_submit_array_job,
ris_submit_multigpu_torch_job, ris_submit_jupyter_job, ris_submit_vllm_job,
ris_submit_conda_env_job, ris_create_pipeline, ris_generate_sbatch_template
Monitoring & lifecycle — ris_list_my_jobs, ris_job_history, ris_explain_job,
ris_get_job_logs, ris_tail_job_logs, ris_cancel_job, ris_hold_job, ris_release_job
Analysis & reproducibility — ris_analyze_efficiency, ris_compare_runs,
ris_get_result_manifest
Per-user worker — ris_bootstrap_worker, ris_worker_enqueue, ris_worker_status,
ris_worker_cancel
</details>
Safety & trust model
| Guarantee | How |
|---|---|
| No arbitrary shell | Every remote command is a fixed template built from validated tokens; no raw command tool exists. |
| Injection-resistant | Zod validation on every input; spawn with argument arrays; base64-encoded uploads. |
| Confined | Workspace-only paths (no traversal, no data on /home). |
| Confirmed | Submissions are dry-run by default and need an explicit confirm. |
| Accountable | Ownership-checked job control; redacted append-only audit log. |
| Private | Duo never automated; passwords never captured; private keys never shown or logged. |
Requirements
- A WashU RIS account with a Compute2 allocation.
- The WashU network or VPN (login nodes are reachable only on-network) and Duo MFA.
- An MCP client (Claude Desktop or Claude Code). For the source route: Node.js 18.18+ and OpenSSH.
Getting started
Use is governed by the license — the steps below are for authorized users.
One-click extension. Install the risbridge-mcp.mcpb Desktop Extension in Claude Desktop
(Settings → Extensions → Install Extension…), enter your WashU username, and you're done — it
bundles its own runtime.
From source.
npm install
npm test # unit tests (no network)
npm run build # → dist/
node dist/cli.js tools # list the registered tools
Then register the server with your client (for example: claude mcp add risbridge -- node /abs/path/dist/server.js), connect to the WashU VPN, and ask RISBridge to set up your SSH key.
Cluster reference
<details> <summary><b>Partitions, storage & GPU modes</b></summary>
Partitions (there is no plain general partition):
| Partition | Use | GPU |
|---|---|---|
general-gpu |
Full-throughput GPU work | Full H100 80 GB |
general-preempt-gpu |
Free, restartable GPU work | Untyped (preemptible) |
general-cpu |
Standard CPU work | — |
general-short |
Quick tests / dev | MIG slice (~10 GB) |
general-interactive |
Interactive sessions | MIG slice |
general-bigmem |
Large-memory CPU | — |
Storage — /home is small (code only); data, environments, outputs, and checkpoints live
under your /storage2/fs1/<lab>/Active/... workspace.
GPU requests — typed --gres=gpu:H100:N by default; untyped --gres=gpu:N is opt-in and
matches more nodes on general-gpu (and is required on preemptible GPU queues).
</details>
Project structure
src/
tools/ 53 MCP tools
core/ sbatch builder (policy chokepoint), ssh/slurm clients, validation, planner
templates/ 12 job templates
auth/ SSH key + Duo / multiplexing
worker/ per-user worker daemon
config.ts environment-driven configuration
tests/ unit test suite
scripts/ installers + .mcpb packaging
License
Proprietary — All Rights Reserved. This software and its source code are proprietary and confidential. No license or permission is granted to use, copy, modify, or distribute it, in whole or in part, without the prior written permission of the copyright holder. Viewing this repository does not grant any such rights. See LICENSE.
To request permission, contact sourabh@wustl.edu.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.