A Lesson Learned

Wednesday, August 23, 2006

Ken FrazierKen Frazier is serving as interim Director of DoIT and campus Chief Information Officer following the July retirement of Annie Stunden. He was appointed to the posts in May by Provost Patrick Farrell. Frazier is also Director of UW?s General Library System.

On a sultry Thursday in late July, a thunderstorm rolled into Madison, dumping pea-sized hail and about five inches of rain in little more than an hour. Water filled streets and basements, floated cars, and caused flooding in more than 70 campus buildings.

Of great concern was the Computer Sciences Building, which houses DoIT?s operations center in its basement. The operations department supports many major computing application of campus and provides the computing platform for University Hospital and many UW System applications.

If you?ve never seen this space, it might be worth a field trip. It has a raised floor that supports computers and people over a massive network of wiring. There?s a steady breeze powered by a muscular air-conditioning system straining to deal with heat generated by stacks of servers. The place vibrates with energy, both human and electronic.

Once it became clear that water was seeping into the wiring space under the floor, DoIT assembled a response team led by Klara Jelinkova. The team deserves high points for calm, creativity, esprit, flair, and elan under pressure. It was the kind of performance that would make any organization proud.

They did everything right, generating a priority list of systems to be brought down?and in what order. These decisions are crucial, because they must anticipate the need to bring the systems smoothly back online.
Campus users surely noticed when they lost their email and calendaring services, but it was no disaster. All applications were up and running again within a few hours.

Things would have been different if the Comp Sci basement had filled with five feet of dirty water?as happened in other buildings. Everything would have come down. The space would have been temporarily evacuated, and many computers destroyed. Most of the University?s enterprise systems would have been out of business.

How long would it take in such a scenario to restore critical IT applications? The best guess of people most familiar with these systems is: weeks. Plural.

Many people outside of DoIT were surprised when I reported this fact. Some believed that we have a hot backup system that would provide rapid restoration of essential IT applications. In this respect, the flood was incredibly useful in stimulating a vitally important conversation about campus priorities for business continuity.

It should also cause us to ask how our disaster recovery systems compare with those of our peer institutions. We don?t know the answer to this question (yet), but if I had to guess, I would borrow a phrase used by a favorite professor when she wanted to gently criticize the work of a graduate student. She would say that it was “not great.”

-- Ken Frazier