System Down! A look at disaster preparedness and recovery.
Disaster can strike your business and website at any time, and from unexpected directions. It doesn’t matter if it’s a broken hard drive, a failed power backup, or the phone company cutting your line with a backhoe, at some point your website will go down.
If your website is important to your business (and we hope it is), it’s critical to have a disaster recovery plan in place. When things like this happen, people are distracted and panicked. They think about what they need to do to get the site back up, and often miss both the little things and the big picture, which in some ways are even more important. A solid plan will make it easier, and will turn a disaster into something much less.
There are three parts to a complete disaster recovery plan:
- A backup plan. This describes what happens everyday to mitigate the impact of a disaster on your business.
- A business continuity and public communication plan. This describes what your company does during a disaster, and what you say to visitors to your site.
- A restoration plan. This describes what your company does to get back on its feet.
You should periodically save your site and its data off the main server. Where and how often you do this is highly variable and depends on your site’s needs: you could store the data on the same server, locally or remotely (or in several remote repositories). Whatever it is, however, there should be something in place. Document it and follow it!
Sometimes, you can rely on the development team to provide a working copy of your website. When we build websites, we have three copies of the site: (i) the production site, (ii) the staging site and (iii) the development site (in Subversion).
But this isn’t enough. The production version should also be backed up nightly to a remote server. In case of a severe disaster, this will ensure that an uncorrupted copy of the production site exists and can be reloaded if necessary.
The Business Continuity and Public Communication Plan
When disaster strikes, what happens next? If your website is absolutely critical to your operation, you should have a live backup that can immediately be switched on (with a duplicate database, etc.). This is overkill for most companies, which will be served fine with a temporary “we’re down” page.
Like the live backup, the “we’re down” page should be served from a different server, ideally in a different location. If your server goes down and your “we’re down” page is on it, it’s also down. If you lose power at your facility, your “we’re down” page won’t work unless it’s somewhere far away.
During an outage, it’s important to communicate to the public — your customers and visitors — about what’s going on. As long as you give as much information as you have, most people are willing to wait and root for you. The more information you give them, the more you bring them on to your side.
We recommend that your “we’re down” page clearly states what has happened (within reason), tell your visitors that you’re attempting to fix it and let them know when you expect it’ll be back up. You can even invite them to enter their e-mail to be notified (assuring them, of course, that you won’t be using their e-mail for any other purposes).
Once the site is live again, clearly apologize to your users, describe what happened and how it was fixed. The easiest way to do this is through a letter from the CEO or another executive. The important thing is to be frank, clear and apologetic.
The Restoration Plan
During all this, hopefully your engineers are working on restoring your website. In case of a crashed server, have a new one in place. If data was lost, restore it from the backup. Whatever it takes, do it methodically and slowly. During a disaster, most people start to react, often making the problem worse.
As soon as disaster strikes, implement your Business Continuity plan. Switch over to your backup “we’re down” page or live backup and breathe a little. The site may be down, but because of your other plans, there’s no panicking. Instead, you’ll carefully do what’s need to restore the site and get back up on your feet.
You may not ever need to use your disaster plan, but if you do, you and your customers will be glad you had one in place.
Hey! This wasn't written by a murder of crows! It was written by Josh Orum, who does awesome work at Loud Dog, a digital branding firm in San Francisco that helps businesses express themselves authentically via identities, websites, and marketing collateral.
If you want us to do awesome work for you, if you have a question, or if you're just feeling lonely and want to chat, we want to hear from you!