As a nation, we have certainly faced our fair share of disasters lately; flooding in Queensland and Victoria, cyclones in Queensland and massive bush fires in Western Australia — just months after devastating earthquakes in Christchurch.
Our hearts certainly go out to all of the people affected by these disasters but I personally feel the pain of all the IT professionals who are, or will be, working tirelessly to bring IT systems back on-line in order to maintain some form of business continuity in these affected areas.
The events personally bring back vivid memories of the ML 5.6 earthquake that hit Newcastle in 1989. At the time I was the IT Manager at Norman Ross Discounts. Shortly after feeling the tremor in our Bankstown head office, the news broke that the epicentre was located below Newcastle.
Being responsible for our IT function throughout all of our stores you would think my first thoughts would be to the state of the infrastructure, but it wasn’t: My first thought was to the safety of our store people in Newcastle.
At the time I thought I had a well established disaster recovery plan (DRP, as they were called back then). The DRP catered for a disaster at the head office data centre. The plan, however, did not consider a catastrophic event at a store. As the IT team tried to check the status of the Newcastle network, I along with every other senior manager at head office, desperately tried to contact our store employees.
What we encountered was basically a communication black-out. The comms networks were either congested or not operational. Somehow, the IT team were able to establish a connection with the Newcastle back office server. The store had sustained considerable damage; luckily our staff was successfully evacuated without injury.
We also found the NCR back office server was humming (well, fizzling) away while water trickled through its cabinet. Kudos to NCR; they built their servers tough back then. It wasn’t long before the server succumbed and failed. Luckily, our DRP had catered for a regular replication of data back to head office, so when we opened another store a few weeks later, we were able to pick up where we left off.
The situation could have been a lot worse, but it taught me some valuable lessons which I have always incorporated into each disaster recovery documentation I’ve developed since.
Here are some tips:
1. Whenever there is an event that jeopardises the safety of people, it should always your first consideration. I doubt there are companies today who don’t have an OH&S policy. Ensure your IT team are fully aware of their responsibilities in the event of an evacuation.
2. Gone are the days of the disaster recovery plan where the head of IT only needed to worry about the recovery of IT services. Businesses today are typically complex and need a comprehensive business continuity plan (BCP). As the name suggests, a BCP is a set of instructions and procedures to maintain some form of business continuity in the event of a disaster. The key word here is ‘business’ and it’s imperative that the ‘business’ takes ownership of the processes. In my experience, however, this still seems to fall on the shoulders of the CIO, so be prepared and step up to the plate. Seize the opportunity; it’s another avenue to demonstrate the value of IT to the business.
3. When preparing a BCP, establish whether it is possible to execute it with just a handful of people, or whether it’s more appropriate to have designated teams specialising in different parts of the recovery. The approach you take depends on the size and complexity of your organisation. I have always found a crisis management team (CMT) a good idea. The CMT should be charged with co-ordinating the processes defined within the BCP and should be a single point of contact for the whole organisation. Publish, and regularly update, the names of the CMT members to everybody within the organisation.
4. Do a risk evaluation on every aspect of your business and decide what services are critical, and what services can take a little longer to recover. Understanding the timelines will assist when designing your BCP. Ensure you have agreement, and buy-in from the executive management team and CEO.
5. Once you have established your BCP, test it on a regular basis. Not just the IT portion — involve the whole business. Your skill at ‘encouraging’ people to do this will certainly be tested. Ensure any external service provider is part of the testing.
6. Don’t rely on public infrastructure as a wide-spread event could render these inoperable. Have a backup plan to communicate with your employees, remote offices and any service providers.
7. Regularly communicate your BCP to everyone in the organisation (and external parties where appropriate). Let them know what is expected of them in the event of a catastrophic event.
8. “You don’t know what you don’t know”, so be prepared for the unexpected. The CMT and/or recovery teams need to be agile and prepared to make decisions on the spot. Empower the CMT and ensure you have selected people who can make tough decisions under pressure.
9. Finally, let’s hope you never actually have to activate your BCP. In more than 25 years, I have only needed to activate it twice and each time Murphy was sitting in the background to prove that we’d missed something.
Allan Davies is the CIO of logistics firm, Dematic.