RISBridge MCP

RISBridge MCP

Enables users to run Slurm jobs on WashU RIS Compute2 cluster by describing their intent in plain English, without requiring SSH or Slurm knowledge.

Category
Visit Server

README

RISBridge MCP

Run jobs on the WashU RIS Compute2 cluster by describing what you want. An intent-to-compute server that turns plain English into safe, validated Slurm work — no SSH, sbatch, partitions, GRES, modules, or storage layout to learn.

License Platform Scheduler Runtime Protocol Type

RISBridge is a Model Context Protocol server. You talk to it in plain language through an MCP client (Claude Desktop or Claude Code); it plans the job, shows you exactly what it will submit, and runs it on the cluster only after you confirm. It is deliberately not a terminal — there is no arbitrary-shell tool. Every action is a specific, schema-validated operation, which is what makes it safe to let an agent drive real HPC work.


Why it exists

Getting research code onto an HPC cluster usually means learning SSH, Duo, sbatch, partitions, GRES, modules, and a storage layout — before running a single line. RISBridge removes that wall:

  • Researchers new to HPC get a guided path — a setup wizard, plain-English planning, dry-run previews, and clear status/log/diagnose tools. No Slurm knowledge required.
  • Experienced HPC users get speed and consistency — job arrays, afterok pipelines, single-node multi-GPU torchrun, vLLM serving, efficiency right-sizing, run comparison, and reproducible history.

Same server, both audiences. The tools scale from "hold my hand" to "give me the template and get out of the way."

How it works

flowchart LR
    A([Plain-English intent]) --> B[Plan & validate<br/>partition · resources · paths]
    B --> C{Dry-run preview<br/>“I will submit: …”}
    C -- confirm --> D[Build sbatch from<br/>fixed, validated templates]
    D --> E[[SSH · Duo / multiplexing]]
    E --> F[(Slurm on Compute2)]
    F --> G[Monitor · logs · explain<br/>efficiency · history]
    G --> A

Login nodes are used only for submitting — every workload runs through Slurm on compute nodes. Submissions are dry-run by default and require an explicit confirm before anything reaches the scheduler.

Highlights

  • Intent → compute. "Run src/train.py on a GPU for 4 hours" becomes a validated sbatch submission. "Why is my job pending?" returns a plain-English answer with a fix.
  • Safe by construction. No arbitrary-shell tool; strict input validation; remote commands run via argument arrays (never shell strings); file uploads are base64-encoded; paths are workspace-confined; job control verifies ownership; a redacted audit log records every action.
  • Auth that respects policy. SSH-key setup, Duo-aware connection multiplexing (approve once, reuse for hours), and an optional per-user worker that orchestrates Slurm from inside the cluster. Duo is never automated; private keys are never shown or logged.
  • Broad workload coverage. Python (CPU/GPU), R, notebooks, GPU smoke tests, job arrays, multi-GPU training, vLLM serving, JupyterLab, conda environments, dependency pipelines, and long checkpoint/requeue runs.
  • 53 high-level tools across auth, discovery, projects/files, planning, submission, job management, logs, environments, expert controls, and the per-user worker.

Example

You type intent; RISBridge does the rest:

"What's my RISBridge setup status?" "Set up my SSH key." → installs the key, one Duo approval "Discover my profile." → finds your Slurm account and storage workspace "Run src/train.py on one H100 for 4 hours." → preview → confirm → job ID "Why is job 1234567 pending?" · "Tail its logs." · "Right-size my last 5 runs."

The toolset

<details> <summary><b>All 53 tools, by category</b></summary>

Setup & authenticationris_setup_wizard, ris_auth_status, ris_setup_ssh_key, ris_show_public_key, ris_generate_ssh_config, ris_write_ssh_config, ris_test_key_only_auth, ris_open_ssh_master, ris_check_ssh_master, ris_repair_stale_socket

Discovery & configurationris_discover_profile, ris_set_profile, ris_show_config, ris_validate_config, ris_list_partitions, ris_gpu_status

Projects, files & environmentsris_create_project, ris_upload_file, ris_list_project_files, ris_ensure_env, ris_list_modules, ris_list_conda_envs, ris_inspect_conda_env

Planningris_plan_run, ris_researcher_wizard, ris_estimate_resources

Submitting jobsris_run, ris_submit_python_job, ris_submit_r_job, ris_submit_notebook_job, ris_submit_gpu_smoke_test, ris_submit_array_job, ris_submit_multigpu_torch_job, ris_submit_jupyter_job, ris_submit_vllm_job, ris_submit_conda_env_job, ris_create_pipeline, ris_generate_sbatch_template

Monitoring & lifecycleris_list_my_jobs, ris_job_history, ris_explain_job, ris_get_job_logs, ris_tail_job_logs, ris_cancel_job, ris_hold_job, ris_release_job

Analysis & reproducibilityris_analyze_efficiency, ris_compare_runs, ris_get_result_manifest

Per-user workerris_bootstrap_worker, ris_worker_enqueue, ris_worker_status, ris_worker_cancel

</details>

Safety & trust model

Guarantee How
No arbitrary shell Every remote command is a fixed template built from validated tokens; no raw command tool exists.
Injection-resistant Zod validation on every input; spawn with argument arrays; base64-encoded uploads.
Confined Workspace-only paths (no traversal, no data on /home).
Confirmed Submissions are dry-run by default and need an explicit confirm.
Accountable Ownership-checked job control; redacted append-only audit log.
Private Duo never automated; passwords never captured; private keys never shown or logged.

Requirements

  • A WashU RIS account with a Compute2 allocation.
  • The WashU network or VPN (login nodes are reachable only on-network) and Duo MFA.
  • An MCP client (Claude Desktop or Claude Code). For the source route: Node.js 18.18+ and OpenSSH.

Getting started

Use is governed by the license — the steps below are for authorized users.

One-click extension. Install the risbridge-mcp.mcpb Desktop Extension in Claude Desktop (Settings → Extensions → Install Extension…), enter your WashU username, and you're done — it bundles its own runtime.

From source.

npm install
npm test          # unit tests (no network)
npm run build     # → dist/
node dist/cli.js tools     # list the registered tools

Then register the server with your client (for example: claude mcp add risbridge -- node /abs/path/dist/server.js), connect to the WashU VPN, and ask RISBridge to set up your SSH key.

Cluster reference

<details> <summary><b>Partitions, storage & GPU modes</b></summary>

Partitions (there is no plain general partition):

Partition Use GPU
general-gpu Full-throughput GPU work Full H100 80 GB
general-preempt-gpu Free, restartable GPU work Untyped (preemptible)
general-cpu Standard CPU work
general-short Quick tests / dev MIG slice (~10 GB)
general-interactive Interactive sessions MIG slice
general-bigmem Large-memory CPU

Storage/home is small (code only); data, environments, outputs, and checkpoints live under your /storage2/fs1/<lab>/Active/... workspace.

GPU requests — typed --gres=gpu:H100:N by default; untyped --gres=gpu:N is opt-in and matches more nodes on general-gpu (and is required on preemptible GPU queues).

</details>

Project structure

src/
  tools/        53 MCP tools
  core/         sbatch builder (policy chokepoint), ssh/slurm clients, validation, planner
  templates/    12 job templates
  auth/         SSH key + Duo / multiplexing
  worker/       per-user worker daemon
  config.ts     environment-driven configuration
tests/          unit test suite
scripts/        installers + .mcpb packaging

License

Proprietary — All Rights Reserved. This software and its source code are proprietary and confidential. No license or permission is granted to use, copy, modify, or distribute it, in whole or in part, without the prior written permission of the copyright holder. Viewing this repository does not grant any such rights. See LICENSE.

To request permission, contact sourabh@wustl.edu.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured