Last updated on November 4, 2009 at 9:11AM
Summary:
Complications during a planned hardware upgrade of a CIFS Gateway led
to an outage outside the established maintenance windows, and degraded
some web services.
Details:
At 7:00AM, Wednesday November 04, one of the six CIFS gateway servers
became stuck in a bad state while attempting a graceful shutdown.
Said state caused a steady increase in load on those PASS nodes
attempting to access the filesystems being touched by the bad CIFS
gateway. As a result, each of the CIFS gateways gracefully removed
themselves from the loadbalancer after crossing the maximum load
threshold, resulting in the overall service outage.
In addition, www.personal.psu.edu and php.scripts.psu.edu are known
to have been impacted, experiencing service degradation due to the
state of those filesystems being touched by the PASS node in a bad state.
All services were restored to full functionality at approximately
7:50AM. Debug data has been gathered and we are working with the vendor
to permanently resolve the root cause of this issue.
For more information, please contact ITS Help Desk (helpdesk@psu.edu).