HeraChanOne of the most important things that a business can learn is to be prepared for anything. A business should be prepared for any emergency that may arise, whether it’s a product missing its deadline, an employee going rogue on the corporate social media accounts, or even Twitter hitting the fail whale. Backups need to be made, accounts need to have safety measures, and emergency plans need to be ready for action. Anything truly unexpected that arises should be taken as a learning experience, and integrated into future disaster recovery plans.

Today, we at The Herald received our first truly rude awakening in this area. Our host, HostGator, went down during a round of routine maintenance at their Provo, Utah data center. In addition to HostGator, the center hosted servers for BlueHost and HostMonster. Unfortunately, our site was indeed hosted at the Provo center. So, in the blink of an eye (quite literally! We were doing a check of our analytics when we saw the live visitor count drop from 25 to zero), Anime Herald vanished into the aether. In addition to us, thousands of other sites were unreachable to the public. Customer service lines were jammed, and countless customers were understandably incensed. The product they had paid for wasn’t being delivered. Getting answers as to what was going on would take us nearly an hour. By this point, many were turned away from the Herald by server time-outs.

Without a way to address readers through the normal channels, we took to social media. We spread the word to our accounts on Facebook and Twitter, and sent word on every update of uptime, and every new site crash. We tried to do what we could in order to minimize the amount of time everybody was left in the dark. By the time everything was finally up, it was nearly ten hours after the first loss of service.

So, from this, Anime Herald actually became a part of a case study in emergency preparedness for us. We learned a number of things from the ordeal, including the following:

  • Server downtime is something that can happen at any time. Even if a host promises 99.9% uptime, there will always be the 0.1% that can cause your operations to grind to a halt
  • While you can’t prevent downtime entirely, find ways to keep the engines running, even when the server is acting up.
  • Always have outlets to notify your readers on what’s happening, and don’t stop.
  • Leaving people in the dark is something that will destroy relationships with your clients, and destroy trust in customers that you’ve built up over weeks, months, or even years. On the converse, a bit of transparency and an “I’m sorry” can lead to goodwill from the market.
  • Customers that feel cheated by events have potential to become passionate detractors. These agents are the most difficult to please, as they feel that they have been abandoned by the very people they placed trust in.
  • Never assume you’re invincible. Or, in our case, never assume that the measures you’re taking are enough. Instead, think the opposite. Think that something will happen and prepare for it. It’s better to have a solution and not need it, than need it and not have it.

Since the incident, we’ve taken numerous steps that will hopefully keep such a disaster from occurring in the future. We’ve signed up with a CDN that offers mirroring, so that server downtime will allow us to keep serving a cached version of the site. We’ve installed numerous enhancements that were previously on the back burner, and implemented numerous fixes to known issues that we had been previously testing.

I want to offer my personal apology for this downtime. While we certainly hadn’t planned on it, we did fail you by not being prepared for the worst. I hope that, with the improvements we’ve made, the impact of any potential server outages going forward will be minimized. However, it was my fault that these weren’t in place earlier. I apologize, and I hope that I can retain both your readership and your trust going forward.