Notice history

Under maintenance

Sep 2024

Resolved
September 25, 2024 at 2:18 PM
Resolved
September 25, 2024 at 2:18 PM
Resolving. The hypervisor and all but one VM, which has separate issue, are operational.
Update
September 25, 2024 at 1:41 AM
Update
September 25, 2024 at 1:41 AM
FASSE Open OnDemand and FASSE login services should be operational now.
Monitoring
September 24, 2024 at 9:09 PM
Monitoring
September 24, 2024 at 9:09 PM
FASSE OOD is back up
FASSE login nodes are still down
Identified
September 24, 2024 at 7:59 PM
Identified
September 24, 2024 at 7:59 PM
One of the hypervisors managing virtual machines is down. We are working to bring it back up. This does affect FASSE login and FASSE OOD nodes as well as may degrade OpenAuth (two-factor).
Affected hosts are:
HOST -- STATUS
dataverse-backup UNKNOWN
demo2-l3-fs UNKNOWN
enos-vote-l3-fs UNKNOWN
fasselogin01 UNKNOWN
fasselogin02 UNKNOWN
frontier-squid02 UNKNOWN
frontier-squid03 UNKNOWN
frontier-squid04 UNKNOWN
goel-adm24-l3-fs UNKNOWN
goel-blind-l3-fs UNKNOWN
goel-l3-fs UNKNOWN
h-dev-fasseooda-01 UNKNOWN
h-dev-fasseooda-lb01 UNKNOWN
h-dev-fasseoodb-lb11 UNKNOWN
h-fasseooda-01 UNKNOWN
h-fasseooda-lb02 UNKNOWN
h-fasseoodb-lb11 UNKNOWN
h-fasseoodb-lb12 UNKNOWN
h-fasseoodc-lb21 UNKNOWN
h-fasseoodc-lb22 UNKNOWN
h-qa-fasseooda-01 UNKNOWN
h-qa-fasseooda-lb02 UNKNOWN
holy-es-master01 UNKNOWN
holy-es-master02 UNKNOWN
holy-es-master03 UNKNOWN
holynagios UNKNOWN
kreindlerl3-fs UNKNOWN
martin-su-l3-fs UNKNOWN
mcconnell-l3-fs UNKNOWN
openauth02 jtriley UNKNOWN
shleifer-dsl3-fs UNKNOWN
stock-solar-l3-fs UNKNOWN
stopsack-l3-fs UNKNOWN
xcat UNKNOWN

Resolved
September 23, 2024 at 8:00 PM
Resolved
September 23, 2024 at 8:00 PM
The failover is complete.
Identified
September 23, 2024 at 7:54 PM
Identified
September 23, 2024 at 7:54 PM
The object storage target OST2b on holyscratch01 is again causing degraded performance. We are failing it over to the backup. We're aware that this issue is a concern, but please know that an entire new scratch filesystem is forthcoming. Thanks for your understanding.

Resolved
September 22, 2024 at 1:00 AM
Resolved
September 22, 2024 at 1:00 AM
holyscratch01 was at times degraded over the weekend. The OST causing the issues was restarted and the filesystem should be back to normal.

Resolved
September 19, 2024 at 1:25 PM
Resolved
September 19, 2024 at 1:25 PM
holyscratch01 was found to be in a degraded state around 9:15AM and returned to operation at 9:25AM

Update
September 19, 2024 at 1:45 PM
Update
September 19, 2024 at 1:45 PM
This incident has been resolved.
Resolved
September 19, 2024 at 1:38 PM
Resolved
September 19, 2024 at 1:38 PM
holyscratch01 and affected nodes are reopened
Identified
September 19, 2024 at 9:18 AM
Identified
September 19, 2024 at 9:18 AM
holyscratch01 is seeing degraded performance and many nodes are closed off.

Aug 2024

Resolved
August 29, 2024 at 4:24 PM
Resolved
August 29, 2024 at 4:24 PM
holylfs04 is back up
Investigating
August 29, 2024 at 4:01 PM
Investigating
August 29, 2024 at 4:01 PM
Holylfs04 needs to be rebooted to address performance issues.

Resolved
August 27, 2024 at 7:19 PM
Resolved
August 27, 2024 at 7:19 PM
The power issue has been resolved. All nodes that were drained are now being re-opened.
Investigating
August 27, 2024 at 2:22 PM
Investigating
August 27, 2024 at 2:22 PM
Due to temporary power availability issues in MGHPCC, half the nodes in 8A have been set to drain. This affects all partitions. Jobs will take longer to schedule but will not be terminated.
We will reopen these nodes once the power issue has been resolved at the datacenter. No ETA at this time

Resolved
August 26, 2024 at 2:29 PM
Resolved
August 26, 2024 at 2:29 PM
boslfs02 is back up
Investigating
August 26, 2024 at 2:20 PM
Investigating
August 26, 2024 at 2:20 PM
boslfs02 needs to be rebalanced. Access may be inconsistent

Starfish upgrade

Completed
August 27, 2024 at 12:00 PM
Completed
August 27, 2024 at 12:00 PM
Starfish is back up
Update
August 26, 2024 at 2:35 PM
Update
August 26, 2024 at 2:35 PM
Starfish maintenance is still ongoing, no ETA at this time.
In progress
August 24, 2024 at 12:00 AM
In progress
August 24, 2024 at 12:00 AM
Maintenance is now in progress
Planned
August 24, 2024 at 12:00 AM
Planned
August 24, 2024 at 12:00 AM
The Starfish Zones Dashboard will be undergoing a few upgrades and maintenance this weekend from Friday, August 23rd at 8AM until Monday, August 26th at 8AM. The dashboard will not be accessible during this time. Further details will be provided, if needed. Please email rchelp@rc.fas.harvard.edu if you have any questions or concerns.

Resolved
August 15, 2024 at 2:06 PM
Resolved
August 15, 2024 at 2:06 PM
This incident has been resolved.
Monitoring
August 15, 2024 at 1:54 PM
Monitoring
August 15, 2024 at 1:54 PM
We are working on rebalancing boslfs02, there may be a short period of degraded performance

Jul 2024

Coldfront maintenance 7/30/24 noon-12:30

Completed
July 30, 2024 at 4:58 PM
Completed
July 30, 2024 at 4:58 PM
Maintenance completed.
Planned
July 30, 2024 at 4:00 PM
Planned
July 30, 2024 at 4:00 PM
Coldfront will be unavailable Tuesday July 30th from 12:00-12:30 for crucial maintenance.

Resolved
July 23, 2024 at 3:11 PM
Resolved
July 23, 2024 at 3:11 PM
boslfs02 is back up
Identified
July 23, 2024 at 2:51 PM
Identified
July 23, 2024 at 2:51 PM
boslfs02 is currently experiencing degraded performance. We are actively working on this issue.

Resolved
July 22, 2024 at 3:12 PM
Resolved
July 22, 2024 at 3:12 PM
All Crowdstrike-related resources are back up and operational.
Update
July 19, 2024 at 10:40 PM
Update
July 19, 2024 at 10:40 PM
For FASRC resources affected by the Crowdstrike issue, most are back in full services. A few remaining issues involving the following may not be resolved until Monda: - waywiser2 - proteomics2 - tmsdb3 - lic3
Update
July 19, 2024 at 4:13 PM
Update
July 19, 2024 at 4:13 PM
Please see HUIT Status (harvard.edu ) for additional information on the global issue caused by Crowdstrike security which Harvard relies on. This is an ongoing issue university-wide.
The systems that continue to be affected at FASRC are minimal, but some Windows-based systems managed by or connected to FASRC may still be affected.
Monitoring
July 19, 2024 at 12:19 PM
Monitoring
July 19, 2024 at 12:19 PM
Authentication is back up and running. Windows machines are still in a bad state and will need remedial work to get them back in service.
Identified
July 19, 2024 at 12:10 PM
Identified
July 19, 2024 at 12:10 PM
Authentication is back up and running. Windows machines are still in a bad state and will need remedial work to get them back in service.
Investigating
July 19, 2024 at 11:13 AM
Investigating
July 19, 2024 at 11:13 AM
Authentication is back up and running. Windows machines are still in a bad state and will need remedial work to get them back in service.

Resolved
July 18, 2024 at 3:17 PM
Resolved
July 18, 2024 at 3:17 PM
boslogin01 is rebooted and ready for use.
Identified
July 18, 2024 at 2:09 PM
Identified
July 18, 2024 at 2:09 PM
boslogin01 will be rebooted at 11:05AM. Due to stuck mounts it is currently not working as expected. Please save any necessary work before the reboot.

Update
July 17, 2024 at 2:55 PM
Update
July 17, 2024 at 2:55 PM
This incident has been resolved.
Resolved
July 17, 2024 at 2:47 PM
Resolved
July 17, 2024 at 2:47 PM
Infiniband has been restored
Identified
July 17, 2024 at 2:33 PM
Identified
July 17, 2024 at 2:33 PM
The Infiniband fabric in MGHPCC that connects nodes with high-speed fiber is down. We are investigating. This _will_ cause performance issues and jobs may be stalled. Updates as we learn more.

Jul 2024 to Sep 2024

FAS Research Computing - Notice history

Notice history

Sep 2024

Aug 2024

Jul 2024