Resilience Through Redundancy

Profit-oriented companies can hardly keep massive amounts of capacity idle and just waiting to be used, but some forms of redundancy are used by all businesses.

Shadow Flights

Every veteran business traveller knows the frustration of a delayed or cancelled flight. But some airplane operators face especially large consequences for disrupted flights. If the corporate motto is "When it absolutely, positively has to get there overnight", then a cancelled flight is not an option. Even a delay of a few hours may mean that someone's package will miss a crucial deadline, disrupting a customer's business and possibly leading to customer losses well beyond the price of shipping an overnight package.

To avoid such problems, FedEx invests in an unusual type of redundant capacity. Every night, two completely empty planes take off - one from an East Coast airport and one from a West Coast airport - and fly to Memphis. In the wee hours of the morning, these planes make a similarly lonely return journey. In the event of a problem anywhere in the United States, these planes can swoop down, land, and pick up packages from a grounded aircraft. About forty more planes are flown deliberately half-empty for the same purpose. Finally, 14 planes (10 in the United States and 4 overseas) serve as spares on the ground that could be pulled into emergency service. Although these planes could not mitigate a major disaster, this fractional redundant capacity does alleviate many potential disruptions.

Inspecting Your Own Faults, Quickly

Located on the quake-prone West Coast, Intel's Oregon chip-making plant is vulnerable to an earthquake. The cost of downtime for this one plant is measured in hundreds of thousands of dollars per hour, putting a premium on fast recovery. The structure itself is built to the highest standards and can withstand most earthquakes, but the plant cannot open for business following an earthquake before inspectors check for hidden damage and dangerous structural faults. Worse, in the aftermath of an earthquake these inspectors would be terribly overworked. With thousands of buildings to inspect, and priority given to facilities like hospitals and schools, Intel knows it would face a three-day wait before being allowed back in its own buildings.

To save itself millions of dollars from the added downtime, Intel trained its own dedicated building inspectors for post-earthquake duty. Rather than rely on and wait for government-provided inspectors, the company can immediately inspect and certify its own buildings to reduce the delay. As a side benefit to the community, Intel plans to let these inspectors help the government examine and certify other buildings after Intel's are inspected. For Intel, the cost trade-off of the redundant training versus the severe costs of downtime creates an obvious business case for employing the inspectors.

Redundant IT Systems

Before September 11, 2001, the area in and around the World Trade Centre served as an information nexus for many financial services firms. Companies like Merrill Lynch, Smith Barney, Morgan Stanley, and Deutsche Bank all had major trading installations supported by a massive information technology infrastructure in and around the WTC. After the terrorist attacks on the towers, and the towers' subsequent collapse, some 20 million square feet [1.9 million square-metre] of offices were destroyed or rendered unusable and the entire local information technology infrastructure lay in ruins.

When the south tower of the World Trade Centre collapsed on Deutsche Bank's New York facility, the German banking giant lost a major connection to the US markets. Despite the loss, COO Hermann-Josef Lamberti said: "We were able, on the very same day, to clear more than $US300 billion with the Fed." Redundant IT systems in Ireland took over when the New York systems were destroyed.

Other firms, such as Merrill Lynch, also quickly shifted operations to backup centres and redundant trading floors near New York City. According to Paul Honey, Merrill Lynch's director of global contingency planning, "within just a few minutes of the evacuation, Merrill Lynch was able to switch its critical management functions to their command centre in New Jersey". Moreover, everyone in the company knew of the redundant facility and was trained to call in or transfer their work to that location.

In businesses that transact billions of dollars a day electronically, building full redundancy is not a difficult decision. But the same is true for most modern corporations: The loss of their information systems means loss of the business. As compared with other redundancies, keeping redundant databases with shadow transactions and redundant application systems is relatively inexpensive given the potential damage from loss of data or the information technology infrastructure.

Redundancy As a Resilience Strategy

Redundancy of any kind helps companies continue serving their customers while rebuilding after a disruption. Indeed, most companies are accustomed to protecting themselves against small fluctuations, mostly in the demand for their products, by keeping spare inventory.

But over the last two decades, many companies have worked diligently to cut costs by reducing exactly this type of inventory, resulting in tightly connected supply chains and higher quality of products and services. Thus, when creating a special safety stock for protection against high-impact/low-probability events, companies should take care not to reverse the gains of such "lean" supply chain operations. In fact, some companies may decide to keep a lean supply chain with little inventory and a single supplier, even for a critical parts. Their rationale is that, on balance, the full cost of coordinating several suppliers and keeping safety stock may be judged to be too high. That was Toyota's consideration following the Aisin fire described in chapter 13.

Yet safety stocks are a part of most resilience and business continuity plans. Even a relatively small amount of inventory can provide a disrupted company with time to prepare its response. A high level of redundancy, however, may be too expensive. Only when the stakes are especially high and the costs of extra capacity are relatively low, as in the case of information technology, should companies keep complete redundant capacity.

Service companies, in particular, typically keep extra capacity because the costs of service failures are high - service is what these companies sell. Furthermore, service companies cannot keep an inventory of their product (eg, if a package was not delivered on time, that service cannot be recovered and offered to the customer later). Consequently, a disruption will lead to an immediate service failure unless there is extra capacity or some other redundancy or flexibility in the system ready to kick in when the service is about to fail.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Follow our new CSO Australia LinkedIn
Follow our new social and we'll keep you in the loop for exclusive events and all things security!
Have an opinion on security? Want to have your articles published on CSO? Please contact CSO Content Manager for our guidelines.

More about AisinBillionDeutsche BankDeutsche BankFedExIntelLaserMITMorganMorgan StanleyResilienceSigmaToyota Motor Corp AustToyota Motor CorporationTransportationUnileverUnileverWall StreetWal-Mart

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Brand Page

Stories by Yossi Sheffi

Latest Videos

More videos

Blog Posts