FAS Research Computing - Notice history

Globus Data Transfer experiencing partial outage

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE | Academic


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Operational

SLURM Scheduler - Cannon - Operational

Cannon Compute Cluster (Holyoke) - Operational

Boston Compute Nodes - Operational

GPU nodes (Holyoke) - Operational

seas_compute - Operational

Operational

SLURM Scheduler - FASSE - Operational

FASSE Compute Cluster (Holyoke) - Operational

Operational

Kempner Cluster CPU - Operational

Kempner Cluster GPU - Operational

Operational

Login Nodes - Boston - Operational

Login Nodes - Holyoke - Operational

FASSE login nodes - Operational

Operational

Cannon Open OnDemand/VDI - Operational

FASSE Open OnDemand/VDI - Operational

Partial outage

Netscratch (Global Scratch) - Operational

Holyscratch01 (Pending Retirement) - Operational

Home Directory Storage - Boston - Operational

HolyLFS06 (Tier 0) - Operational

HolyLFS04 (Tier 0) - Operational

HolyLFS05 (Tier 0) - Operational

Holystore01 (Tier 0) - Operational

Holylabs - Operational

BosLFS02 (Tier 0) - Operational

Isilon Storage Boston (Tier 1) - Operational

Isilon Storage Holyoke (Tier 1) - Operational

CEPH Storage Boston (Tier 2) - Operational

Tape - (Tier 3) - Operational

Boston Specialty Storage - Operational

Holyoke Specialty Storage - Operational

Samba Cluster - Operational

Globus Data Transfer - Partial outage

bosECS - Operational

holECS - Operational

Notice history

Aug 2024

Starfish upgrade
  • Completed
    August 27, 2024 at 12:00 PM
    Completed
    August 27, 2024 at 12:00 PM

    Starfish is back up

  • Update
    August 26, 2024 at 2:35 PM
    In progress
    August 26, 2024 at 2:35 PM

    Starfish maintenance is still ongoing, no ETA at this time.

  • In progress
    August 24, 2024 at 12:00 AM
    In progress
    August 24, 2024 at 12:00 AM
    Maintenance is now in progress
  • Planned
    August 24, 2024 at 12:00 AM
    Planned
    August 24, 2024 at 12:00 AM

    The Starfish Zones Dashboard will be undergoing a few upgrades and maintenance this weekend from Friday, August 23rd at 8AM until Monday, August 26th at 8AM. The dashboard will not be accessible during this time. Further details will be provided, if needed. Please email rchelp@rc.fas.harvard.edu if you have any questions or concerns.

Jul 2024

Authentication issues - Related to global Crowdstrike incident
  • Resolved
    Resolved

    All Crowdstrike-related resources are back up and operational.

  • Update
    Update
    For FASRC resources affected by the Crowdstrike issue, most are back in full services. A few remaining issues involving the following may not be resolved until Monda: - waywiser2 - proteomics2 - tmsdb3 - lic3
  • Update
    Update

    Please see HUIT Status (harvard.edu) for additional information on the global issue caused by Crowdstrike security which Harvard relies on. This is an ongoing issue university-wide.

    The systems that continue to be affected at FASRC are minimal, but some Windows-based systems managed by or connected to FASRC may still be affected.

  • Monitoring
    Monitoring

    Authentication is back up and running. Windows machines are still in a bad state and will need remedial work to get them back in service.

  • Identified
    Identified

    Authentication is back up and running. Windows machines are still in a bad state and will need remedial work to get them back in service.

  • Investigating
    Investigating

    Authentication is back up and running. Windows machines are still in a bad state and will need remedial work to get them back in service.

Jun 2024

FASRC websites unavailable
  • Resolved
    Resolved

    This incident has been resolved. Both sites are working normally.

  • Investigating
    Investigating

    https://www.rc.fas.harvard.edu/ and https://docs.rc.fas.harvard.edu/ are offline.

    We are currently investigating this issue.

FASRC websites -Unplanned maintenance (www.rc.fas.harvard.edu and docs.rc.fas.harvard.edu)
  • Completed
    June 26, 2024 at 4:29 AM
    Completed
    June 26, 2024 at 4:29 AM

    This update was completed successfully.

  • Planned
    June 26, 2024 at 4:17 AM
    Planned
    June 26, 2024 at 4:17 AM

    An unplanned maintenance on www.rc.fas.harvard.edu and docs.rc.fas.harvard.edu is required.

    ETA is approximately 1 hour . We apologize for any inconvenience.

MGHPCC Pod 8A Power Upgrade June 24 will idle some Cannon nodes
  • Completed
    June 25, 2024 at 4:00 AM
    Completed
    June 25, 2024 at 4:00 AM
    Maintenance has completed successfully
  • In progress
    June 24, 2024 at 4:01 PM
    In progress
    June 24, 2024 at 4:01 PM
    Maintenance is now in progress
  • Planned
    June 24, 2024 at 4:01 AM
    Planned
    June 24, 2024 at 4:01 AM

    MGHPCC will be performing power upgrades on Pod 8A in order to increase density and allow more nodes to be added in that Pod's rows.  Similar to the May 13th work, this means that we will be idling half the nodes in 8A on two dates: June 17 and June 24th.

    These are all day events, meaning that the nodes in question will not be available for the 24 hours of that day.  This is being accomplished via reservations. So no jobs will be canceled but nodes will be drained and users may notice that their jobs may pend longer than normal as the scheduler idles these nodes.

    Where possible, please use or include other partitions in your job scripts and plan accordingly for any new or long-running jobs during that period: https://docs.rc.fas.harvard.edu/kb/running-jobs/#Slurm_partitions

    This affects the Cannon cluster. FASSE is not affected.

    Impacted partitions are:

    arguelles_delgado_gpu

    bigmem_intermediate

    bigmem

    blackhole_gpu

    eddy

    enos

    gershman gpu

    hejazi hernquist_ice

    hoekstra hsph

    huce_ice

    iaifi_gpu

    iaifi_gpu_priority

    iaifi_gpu_requeue

    intermediate

    itc_gpu

    itc_gpu_requeue

    joonholee

    jshapiro

    jshapiro_priority

    jshapiro_sapphire

    kempner

    kempner_dev

    kempner_h100

    kempner_requeue

    kempner_reservation

    kovac

    kozinsky

    kozinsky_gpu

    kozinsky_priority

    kozinsky_requeue

    murphy_ice

    ortegahernandez_ice

    sapphire

    seas_compute

    seas_gpu siag

    siag_combo

    siag_gpu

    sur test

    yao

    yao_priority

    zhuang

Jun 2024 to Aug 2024

Next