FAS Research Computing - Slurm - critical security patch – Incident details

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Slurm - critical security patch

Resolved
Major outage
Started over 1 year agoLasted about 1 hour

Affected

Cannon Cluster

Major outage from 3:02 PM to 4:26 PM

SLURM Scheduler - Cannon

Major outage from 3:02 PM to 4:26 PM

Cannon Compute Cluster (Holyoke)

Major outage from 3:02 PM to 4:26 PM

Boston Compute Nodes

Major outage from 3:02 PM to 4:26 PM

GPU nodes (Holyoke)

Major outage from 3:02 PM to 4:26 PM

FASSE Cluster

Major outage from 3:02 PM to 4:26 PM

Updates
  • Resolved
    Resolved

    The security patch has been applied, and all clusters are accepting jobs at this time.

  • Investigating
    Investigating

    SchedMD (the maintainers of Slurm) have discovered a critical security flaw in Slurm. Due to the nature and severity of the issue, we will be immediately applying this patch.

    Cannon and FASSE schedulers will remain down for the duration of the patching. All running jobs will be paused, and new jobs will not be accepted until the scheduler is back up.

    ETA is expected to be approximately one hour.