Best Practices For IT Availability

Technology decisions play a vital role in supporting your overall strategy

Forrester often gets inquiries such as, "What requirements should we keep in mind while developing our disaster recovery plans and documents?" and, "Which strategies work best for managing our disaster recovery program once it's in place?"

Technology supports disaster recovery preparedness, but it doesn't constitute a strategy or plan. You need to have a framework in place to manage disaster recovery preparedness as a continuous process, not a one-time event.

Processes have to be in place to ensure that disaster recovery plans are continuously updated as a part of change and configuration management and are regularly tested. In addition, it's important to periodically update the business impact analysis (BIA) and risk assessments (RAs) that provide the key inputs into the development of your disaster recovery strategy and specific plans.

By taking a proactive approach to disaster recovery, rather than being unprepared when a disaster occurs, you will save your company substantial money in the long run. Organizations that take this more proactive, more holistic approach, often use the term IT service continuity rather than "disaster recovery."

However, as companies become increasingly dependent on IT for day-to-day business operations, business owners demand greater levels of IT availability, sometimes at 99.95% or better. This has forced IT operations teams to revisit their strategies for both local high availability and IT service continuity. So, technology decisions play a vital role in supporting your overall strategy.

Forrester sees Infrastructure & Operations (I&O) professionals evaluating technologies and services such as:

1. Local and long-distance clustering for zero downtime.
2. Server virtualization high availability and fault-tolerant technology for near-zero downtime at the primary site as well as rapid restart of virtual machines at the recovery site.
3. Local snapshots and remote replication technology for near-zero data loss.

The "how to" of IT availability and service continuity is not the only challenge. If money were no object, I&O professionals could implement solutions that would enable zero downtime and zero data loss for all their IT systems.

But the pressure to maintain or reduce IT costs means that they must justify the investment in availability technologies by categorizing IT systems in terms of their criticality and implement the most cost-effective solutions to achieve agreed-upon recovery objectives or service-level agreements (SLAs). Determining the criticality of IT systems and writing meaningful, achievable objectives or SLAs with business owners are often far more challenging than the implementation of the technology itself.

In recent research for its Infrastructure & Operations Council, Forrester uncovered four best practices:

1) Classify systems for criticality. Whether you are developing a strategy for operational high availability or IT service continuity, determining criticality requires that you perform a BIA. For each business process, you must map dependent IT systems, calculate the cost of downtime, and determine availability rates and recovery objectives. You must also determine the probability of certain types of risks from IT failures to human error.

Selling management on business metrics such as, "The business demands that we provide less than 4-hour recovery of our customer care system with less than a minute loss in transactional data," is much more compelling to an executive than, "We need $3.2 million for hardware and $300,000 per year in telecommunications expenses for a data replication solution." This is why conducting the BIA is so important and why IT can't just start with technology.

2) Develop tiers of service for both availability and IT service continuity. To reach the next level of maturity, IT professionals must shift their thinking from disaster recovery to IT service continuity. IT service continuity is less a reactive response to catastrophic events and more focus on the nearly continuous availability of IT services. Once your range of recovery objectives is determined, it often helps to develop an IT availability and service continuity catalog. The catalog is a range of service tiers. Each service tier has associated availability rate, recovery objectives, the technology prerequisites, and the cost to deliver the service. This catalog helps you simplify your strategy, quickly assign new IT systems to a service tier, and communicate with the business.

3) Measure availability from the end-user perspective. Well-written objectives must measure unplanned and planned downtime. They must take into account timing of the downtime (e.g., end of month, quarterly close, and peak sales periods), and they must measure downtime from the perspective of the user. This means that you must measure the availability of the end to end IT service, not just the individual infrastructure components such as clients, server, storage and networks.

4) Include availability and continuity considerations in application development and testing. Too often, availability and continuity are considered after an application has already been deployed. At this point, the choice of server, storage, and network infrastructure and the application processing and logic will limit certain availability and continuity options. Resiliency has to be a part of application development, infrastructure selection, and acceptance testing.

The cardinal mistake when developing IT service continuity strategies and justifying investments is to lead with technology. It might seem burdensome and complicated to conduct a business impact analysis and risk assessment with a cross-function team of business owners, risk management professionals, facilities, and IT, but it's critical; with the results you can identify business requirements, risks, and impacts to create quantitative justifications for investment and get the entire business onboard.

Stephanie Balaouras is a Principal Analyst at Forrester Research, where she works closely with its Infrastructure & Operations Council, which is part of the Forrester Leadership Boards. For more information and to download related research, please visit recovery.

Join the CSO newsletter!

Error: Please check your email address.

Tags disaster recoveryBusiness Continuityavailability

More about Forrester Research

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Stephanie Balaouras

Latest Videos

  • 150x50

    CSO Webinar: Will your data protection strategy be enough when disaster strikes?

    Speakers: - Paul O’Connor, Engagement leader - Performance Audit Group, Victorian Auditor-General’s Office (VAGO) - Nigel Phair, Managing Director, Centre for Internet Safety - Joshua Stenhouse, Technical Evangelist, Zerto - Anthony Caruana, CSO MC & Moderator

    Play Video

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

More videos

Blog Posts

Market Place