By Chris Bulfinch ’18
In an initial email to the Trinity community on the morning Oct. 26, Trinity’s Director of Infrastructure and Assistant CIO Frederick Kass cited a “power failure” that “disrupted almost all services.” Later in the day, after “most major services [had] been restored,” Kass emailed the College community again, explaining that “we take this outage very seriously,” and that his department was “working hard to fully understand and resolve the root cause.”
Weeks later, Kass and his department have a significantly better sense of what happened to cause the outage, why such a situation came about, and how to prevent the situation in the future. Kass explained to The Tripod that “the issue is not [particularly] technical in nature.”
He elaborated that the WiFi network’s “primary machine room has both a backup generator and a battery uninterruptible power supply (UPS), to keep [the network] running in the event of a power outage.” The UPS is designed to turn off if the fire system is activated in the room; apparently during a routine test of the fire system, a “safety contractor” activated the fire system in the UPS room by mistake.
Kass explained, “it takes a long time to recover from a full unexpected power outage” because Trinity’s network and server configurations “need to be manually restarted and verified by administrators in a specific order to fully restore service.”
Trinity’s WiFi network is a “campus enterprise IP based switch/route network,” a setup that, while rhetorically incomprehensible, is “similar to most campuses of [Trinity’s] size,” according to Kass. Outages, such as the one experienced on Oct. 26, are rare for such networks, though outages do occur “every few years.”
College WiFi networks play a game of update leapfrog, according to Kass. “Colleges typically have to replace their networking hardware every five to seven years,” elaborated Kass, and Trinity is just beginning our refresh phase.”
This “refresh phase” has caused Trinity’s network performance to “lag behind many peer institutions.” Once Trinity’s refresh is completed, “we will lead for a bit.” The relatively outdated nature of Trinity’s network hardware did not help the restoration of network functionality.
Trinity is beginning a number of projects to “design a faster network… prioritizing resilience and redundancy into future network design.” Specific steps towards this goal include “working on the academic buildings’ wireless, the campus routing core, improving internet speeds, and individual building electronics,” as well as “locating equipment in multiple locations around campus” to mitigate the effects of power failures. Decentralizing network hardware seemed to be a central thrust of the updating process.
Trinity’s updating process and day-to-day network maintenance costs are funded by a combination of the College’s maintenance reserves and a “generous grant” from the National Science Foundation.
In order to prevent more outages in the future, until the completion of the hardware refresh period, Kass and Trinity’s network team is “working with Facilities and outside contractors to make sure procedures are correct and followed exactly.”
Trinity IT Explains Cause of TrinAir Network Outage
By Chris Bulfinch ’18
Leave a Reply