IT experienced network failure


The data center of the Butler University Information Technology department experienced a partial network failure last week. 

Chad Miller, senior director of the technology development, said IT uses redundant switches —which provide servers with more than one of the same connection—to connect all the servers on campus and provide backup in case of a crash.

“We have about 90 percent of our servers connected to these boxes,” he said. “So if one fails, they’re still connected and don’t have any problems.” 

Instead, the servers with only one connection caused the failure. These included servers that were scheduled to be replaced, as well as a small percentage of authentication servers and the server that powers the Google search engine on Butler’s website.

“One of those boxes failed,” Miller said. “When one of them failed we had a small amount of servers that also lost their network connection. We had spares on site. We identified the problem. We replaced parts of the gear, and we were able to bring that network switch back up that morning.”

Not many students reported having problems. There were a few exceptions, such as Luke Zygmunt, who was trying to take an exam on Moodle.

“Later on, I had to close out of it, and when I tried to re-open it, I couldn’t get back on,” he said. “It just locked me out.”

Zygmunt emailed his professor and told him about the problem, and after waiting for the network to be fixed he was able to get back onto Moodle.

“I would have been done with the exam maybe 15 to 20 minutes earlier if I hadn’t run into that issue,” he said. “I was trying to find a way around it.”

When they started receiving calls to the help desk, IT sent a department email to make everyone aware of the problem and sent a team to find and fix the problem. 

The redundant switches made the failure a smaller problem, since some of the connections had back-ups. This enabled most connections to keep running. Even though the majority of the switches connected to the networks were already backed up, Miller said IT is looking into ways to increase redundancy to prevent future failures. 

“Because of that failure, we are looking at how to get closer to 100 percent with server redundancy,” he said. “Things do fail, but we’re looking to bring up that percentage to as high as we can so, if we have servers that lose their connectivity, we won’t have another problem like this.”