Data Centers and Hair Driers

Posted October 26th, 2007 by

Ok, this is my first guest appearance on his blog. I have worked with rybolov in the past and also spoke with him at Potomac Forum events. The topic today is Disaster Recovery/COOP/Contingency Planning depending on what language you speak.

About 3 weeks ago we had an “incident” at our office where our server room lost cooling sometime around midnight one night during the week. Now we thought we had processes and procedures in place that would notify building security, facilities and the proper IT staff in the event of an emergency. Oops, we were wrong!?? It was not until the following morning around 5 am when the first IT staff came in that anyone noticed that the server room was around 90 degrees. During the evening, one of the chillers went down due to a misconfiguration issue and the backup was not able to keep up with demand. Once someone noticed this problem, the AC vendor was notified and sent someone out ASAP to fix the problem. In the meantime, non-mission essential machines were brought down to reduce load, doors were open and makeshift fans were placed in the room to increase airflow.

The next day, one of the security guys (me) decided to investigate the “incident” further to find out what really went wrong. We had many different breakdowns at all levels. For starters, did anyone noticed the temperature alarms going off in the building? Well yes, but we get to that later. Once the guard desk was notified about an hour after the unit went down, a phone call was made. Problem is the person who is on call 24/7 to address facilities issues was unavailable. The guard left a voice mail. Was anyone else called and notified? Nope, there was not a call tree or contact list to follow up with. Issue #1: Create a call tree. It does not do any good to call just one person and give up, especially at 2 am when nobody really wants to answer the phone.

Next item addressed was who called the guard desk to tell them of the temperature issue? I went down and spoke to the guards and they did not take the name or number of who placed the phone call to them in the middle of the night. All they know is that it was some kind of monitoring service. But wait, since when do we have an alarm monitoring service? I asked numerous people in facilities, IT, Finance and Accounting if they knew about any monitoring contracts in place. Everyone was clueless. I called the vendor listed on the thermostat in the room. They had no record of an account with us or our parent company. So after about a week of fishing for this information what should I do? My “creativity” kicked in and I decided to set off the alarm again and this time ask the guard desk to write down the contact information when the monitoring service calls again. At this point I knew the temperature alarm was not tied into fire alarm so I was not worried about fire trucks showing up or sprinklers going off at this point. This is the funny part. I went over to a co-worker and asked her if said had an extra hair dryer I could borrow tomorrow. She looked at me and laughed. One of my nicknames is mini-me because I am bald and look like mini-me from Aston Powers. Next day she brings one in and I am walking around the building with a hair drier getting all sorts of dirty looks (also wearing an Hawaiian shirt for added effect). After heating the thermostat to around 85 degrees for about 30 minutes, BINGO!!!!! The monitoring service calls. It was the same vendor I had contacted a few days before but was a different office in another part of the country. After I spoke to them a few minutes later in the day we found out we were getting free monitoring since February but that is another long story. Issue #2: Document what you have so you can make educated decisions in the future.

The biggest issue out of all this was the airflow in the room itself. I asked our department if anyone had documentation on the BTU load in the room. Nope. OK, I then spent a few hours documenting everything in the room and came up with a rough number. We originally thought we had about 50% additional cooling available for a contingency situation. Wrong again! We are actually right at maximum capacity with very little room for growth. After addressing this issue, we later found out that smaller AC units were installed than originally planned. A basic air flow study of the room was conducted and it was determined the wrong floor tiles were placed in the floor which was causing local hot spots in the room and preventing the correct flow of air to cool everything. Issue #3: Have someone look at the physical layout of the room. Security overlaps many boundaries so make sure to tap many resources for different points of view.

What is the point of all of this? Spend some time on contingency planning to put yourself in a proactive mode instead of waiting until it is too late. Create reasonable contingency plans and actually test them on a periodic basis. Conduct table top exercises with management and incorporate various “what if” scenarios. My favorite was when the toilet on the second floor overflowed one weekend causing water to pour into the first floor electrical closet and completely bring down a large building for an entire week (this actually happened). Maybe think about actually pulling one of your backup tapes and attempting to restore it in real time. Check out the various resources on the web such as www.DRII.org to get ideas. Most of all, don’t wait until it is too late and something bad happens. A little planning can go a long ways.



Similar Posts:

Posted in FISMA, Odds-n-Sods, What Doesn't Work | 3 Comments »

3 Responses

  1.  rybolov Says:

    Heh. The sight of my short, bald friend looking for a blowdryer still gets me in stitches.

  2.  Vlad the Impaler Says:

    Ya know… I can SOOO see our friend doing the hair dryer routine, but what got me was the Hawaiian shirt.

    Most excellent. And I have yet to write my guest Blog entry…

    VTI

  3.  It’s a Blogiversary | The Guerilla CISO Says:

    […] Mini-Me, he’s short, he’s bald, and he guest-blogs from time to time about needing a hairdryer. […]

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.


Visitor Geolocationing Widget: