No Single Point of Failure

Thanks to The Mad Peacock’s Is It Safe? post for my inspiration today.

In small companies there isn’t a lot of focus on redundancy and reliability in personnel nor in infrastructure. How many people have worked for a company where the corporate e-mail system has gone down for an entire day – or more? I have. Has anyone had a failure in their source code control library that lost a day or more of work across an entire engineering organization? Yup – been there too. Have you ever had someone quit on the spur of the moment and someone else had to figure out what they were working on and what it did? Oh yeah.

Companies are running lean these days and that means that every penny is scrutinized. Unfortunately a lot of times decisions are made that can lead to catastrophic consequences. Consider this scenario – due to budgetary reasons you determine that an automated $40,000 backup and restore system is too expensive, so you have IT run scripts to backup critical data on a weekly basis. Near the end of a weeks worth of work, a critical server goes down. That means that every bit of work your 20 person engineering team introduced has to be mentally recalled, reimplemented, and retested. If your average loaded labor rate is $100k (very low) your average weekly cost for those engineers is ~$38,000. One outage would pay for the entire system – a system that could protect not only engineering data but other corporate data as well. You may argue that a server failure is unlikely, but do you really want to play Russian roulette with your information – especially if it is the intellectual property that makes your company valuable?

There are certain areas that should never be single points of failure.

  • Product Development Repositories
  • Gold Masters for Released Products
  • Financial Records
  • Customer Contracts
  • Customer Support Issues
  • Expertise in Crucial Systems Development

You may be thinking that I am focused on redundant data. That is key, but I believe that the last item in the list is the most important. It is expensive to have two people know exactly the same things – most times it just isn’t feasible. However, through careful dissemination of information through design and code reviews (and the associated documentation) you will be able to piece together developer intent and methodologies much faster than if someone comes in completely cold.

Remember to consider the opportunity cost of NOT protecting your resources, not just the outlay for the processes and tools to do it.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s