Monday, July 21, 2008

Disaster Recovery: Planning for failure

At the time of our fire we had a very shaky DR plan. It consisted of tape backups, external USB-connected hard drives, and a couple of hastily jotted down lists of the most critical things that needed to happen. Overall it's probably the same in most SMB's... if they have anything at all.

It is unconscienable that the previous IT Manager left with absolutely, positively, no disaster plan at all. We weren't even storing backup tapes offsite. Hell, we were doing nightly backups of changed data, then every Saturday a full backup... on the same freaking tapes, week after week, month after month, for at least two years. Our SQL Server was backing up to the same external RAID array that held the production data, and the entire backup directory was saved to tape weekly. Cleaning up old backups was a manual process undertaken when the drive was close to running out of space.

My boss started in December 2006, and I was the first person he hired in May 2007. He hired the network administrator in June 2007. It took until August 2007 for us to finally get a solid backup strategy that still includes the CFO taking the weekly backup tape home every Monday morning. It's not a good solution but it's better than what we had. All in all, though, our disaster recovery plan was actually a plan for utter failure.

The last week has been a blur, but a common topic of conversation is how to plan better to make a disaster such as this mostly an IT non-event. Now that we have about 100% of our services back online we're digging into what this means. It's a given that some things are going to have to be replaced; the idea is to try to create as resilient and survivable an infrastructure as possible balanced against the cost of the solution and the business' risk tolerance.

So now we have moved beyond the previous failures in planning and are now planning for failure. The options are nearly limitless, the questions overwhelming and difficult to navigate. We know we want a virtualized infrastructure and we want a blade solution with a SAN and possibly a NAS. We have narrowed down the vendors to Dell and HP, with IBM sometimes mentioned but not being seriously considered. I'll get into that discussion later.

Luckily I have gone through a similar process previously. At my last job we spent a year going through the process of selecting what ended up being an IBM BladeCenter and DS4300 SAN, then another four months implementing it. The difference here is we have 90 days to hand over our current equipment to the insurance company since it is being written off. This is going to be fast and furious.

No comments:

Post a Comment