CIO

Resilience Through Redundancy

Profit-oriented companies can hardly keep massive amounts of capacity idle and just waiting to be used, but some forms of redundancy are used by all businesses.

What happens to a company when the unimaginable occurs? When an earthquake hits its primary contract manufacturer? When labour strikes shut down an entire port? When terrorists cripple a transportation system?

Yossi Sheffi, Professor of Engineering at MIT and Director of the MIT Centre for Transportation and Logistics, argues that a company's survival and prosperity depend more on what it does before such a disruption occurs than on the actions it takes as the event unfolds. In The Resilient Enterprise: Overcoming Vulnerability for Competitive Advantage, Sheffi explores high-impact/ low-probability disruptions, focusing not only on security but on corporate resilience - the ability to bounce back from such disruptions - and how resilience investments can be turned into competitive advantage. This is an excerpt from Sheffi's book.

Between September 18 and October 9, 2001, a series of letters were deposited at a mailbox in New Jersey. Poison-pen letters in a very literal sense, the envelopes contained a fine powder of deadly anthrax spores along with a short handwritten anti-American missive. Addressed to a variety of US government offices and American media companies, the lethal letters created a scare, killed five people, and infected 19 others.

As these letters made their way to their fateful destinations, they left behind a deadly residue. Sent to addresses in New York, Washington, DC, and Florida, the letters entered the United States Postal Service's (USPS) massive network. The fine anthrax dust leaked from the envelopes to contaminate the Brentwood Processing and Distribution Centre in Washington DC, the Trenton Processing and Distribution Centre in New Jersey, and a host of minor mail handling facilities in New Jersey, New York, Washington, and Florida. The Brentwood facility is an imposing 633,000 square-foot [58,808 square-metre] brick building. Inside, 2500 workers work 24 hours a day, seven days a week to handle much of the torrential flow of letters coming to and from the nation's capital. Some three and a half million items pass through Brentwood every day.

On October 21, 2001, two workers at the Brentwood facility were hospitalized with suspected (later confirmed) cases of anthrax. The USPS immediately shut down the facility for a thorough inspection. To their horror, they found anthrax spores on the mail-sorting equipment. The two sickened postal workers died the following day. It took two years to decontaminate and refit the cavernous facility.

In the meantime, the government's mail had to get through; the USPS had to find an alternative to Brentwood's lost capacity. "Neither sleet nor rain nor anthrax will keep these carriers from their appointed rounds," promised Sue Brennan, a spokeswoman for the US Postal Service. The USPS quickly rerouted Brentwood bound flows to two other distribution centres in Capitol Heights and Gaithersburg in Maryland. By most accounts, mail delivery the day after the closure was normal.

The USPS survived the closure of the 633,000 square-foot [58,808 square-metre] Brentwood facility, the 300,000 square-foot [27,871 square-metre] Trenton facility, and other smaller facilities because of the massive overcapacity built into its system. Such redundant capacity was not the result of planning for disaster. Instead, it was the consequence of the reduction in the volume of mail resulting from the increasing use of the Internet to pay bills, write letters, and send greeting cards. Since USPS workers are subject to civil-service employment laws, the USPS cannot adjust its operations quickly for the falling business volume, resulting in massive overcapacity.

Profit-oriented companies can hardly keep massive amounts of capacity idle and just waiting to be used, but some forms of redundancy are used by all businesses.

Inventory for Redundancy

The basic form of redundancy used by all businesses is safety stock. Although the extra inventory of parts and raw material on the one hand and finished product on the other can protect a company against small changes in the demand and supply patterns, it is expensive. Keeping extra supplies of parts and products not only ties up capital but also requires managing this inventory, including warehousing it, maintaining it, and preventing damage or pilferage. In addition, many products can become obsolete while they are stored in inventory, as new, better, and less expensive products are introduced into the market.

Extra inventory is also often the culprit in hidden manufacturing problems. With extra inventory, it is all too easy for production managers to tap the parts' inventory in order to replace a defective part, or to fulfil a customer order from the inventory of finished goods, rather than to investigate the source of the problem. But with little or no extra inventory, each problem causes an unfilled customer order or a stoppage of the production line, requiring immediate management attention leading to a corrective action. As Toyota Motor Corporation has proved, reducing inventory (and using just-in-time discipline) leads to improved quality.

Thus the dilemma: Although inventory can be used to protect against disruptions, it is expensive; more important, it can lead to relaxed manufacturing, procurement, and logistics disciplines at the expense of quality products and delivery.

Page Break

SOSO Inventory

As a major provider of medical supplies, Johnson & Johnson serves many hospitals and pharmacies. Because the demand for its products ebbs and flows with the flu, hay fever, and cold seasons, as well as outbreaks of various diseases, J&J keeps safety stock in several warehouses for use when demand for any of its products exceeds the forecast. One of J&J's customers is the Pentagon. Normally, the Pentagon buys medical supplies in predictable patterns that J&J can supply from its manufacturing plants and warehouses. In case of a war or a major disaster, however, the Pentagon knows that it will need huge amounts of medical supplies very quickly.

For that reason, J&J is under contract with the US government to stockpile certain quantities of medical supplies. (Indeed, the US government has increased funding for the strategic stock-piling of vaccinations and medications from $US41 million in 2001 to $US400 million in 2005.)

To meet this contractual obligation, J&J has two major challenges: how to keep the extra inventory fresh and up-to-date, and how to ensure that the extra inventory will not infect its processes with sloppiness, leading to expensive quality problems. J&J solves the problem with a "sell one stock one" (SOSO) inventory discipline.

Under the SOSO strategy, J&J does not let the Pentagon's inventory moulder in a dedicated warehouse; instead, the inventory is commingled with the rest of J&J stock. To keep its commitment, J&J defines a "red line" for each product; when the inventory for a particular product falls to the red line, J&J computers signal the ordering hospital or pharmacy that J&J is out of stock. Because going below the red line requires Pentagon approval, this inventory cannot be used to compensate for day-to-day variations. Consequently, J&J's everyday processes have to operate as if such inventory does not exist, thereby reducing the danger of sloppiness.

Using a SOSO strategy can mitigate some of the costs of keeping extra inventory. Commingling ensures that the stock is fresh, and requiring high level approval (CEO, board, or the Pentagon in the case of J&J) for tapping into the SOSO inventory ensures that the inventory will not undermine the company's quality processes.

In many regards, the Pentagon is using J&J's SOSO stock in the same way that the Department of Energy uses the Strategic Oil Reserve. These reserves were not established to mitigate price fluctuations; rather, they serve as backup inventory of a critical material in case of a national crisis.

Naturally, when companies are aware of potential disruptions, they can accumulate inventory to cushion the effect. This can be in anticipation of either a one-time phenomenon or a continuing situation. For example, prior to the West Coast port lockout in October 2002, Wal-Mart stockpiled some three to five weeks' worth of inventory to prevent the disruption from affecting holiday sales, and NUMMI accumulated several days' worth of parts supply. And when companies enter into an agreement with a supplier whose deliveries are less than predictable (because of distance, location, or process peculiarities), those companies can change their policies to increase inventory by using a higher "reorder point" in their inventory management. Unilever, for example, increased its North American safety stocks of Q-Tips by 10 percent as part of contracting all the production to a Puerto Rico plant.

Redundant Capacity

Instead of using inventory for redundancy, some enterprises use redundant capacity for mission-critical business units. Boston Scientific manufactures an array of high-tech medical devices such as drug-coated stents that prop open the arteries of heart patients and help keep them blockage-free. For these specialized products, the company uses an array of sophisticated manufacturing systems to laser-cut nickel-titanium tubes into the delicate yet strong meshes that then receive the company's patented coatings. The nature of the product, and FDA regulations, specify meticulously clean and controlled production conditions. Each lot of stents must be traceable, requiring some 40 pages of paper-work to certify when, where, and how the devices in the lot were manufactured.

Were Boston Scientific to suffer from a disruption of its manufacturing facilities (eg, a fire, industrial accident, or contamination), the company knows that the time to fix and recertify a disrupted facility could leave the company without a major portion of its revenues and profits and allow its aggressive competitors to take Boston Scientific's market share.

After assessing its vulnerabilities, Boston Scientific built redundant production lines for some of its most important products. These alternative manufacturing facilities are kept FDA-certified and ready to go in the event of a disruption. The company also has personnel who maintain the skill levels needed to operate those redundant lines. Although such redundancy is not inexpensive, the company realized that failing to maintain redundancy risks the entire company and decided to protect itself against that risk. Other companies aim for less than 100 percent capacity utilization rates on their existing production lines, reasoning that the unused capacity acts as a cushion to absorb unanticipated large orders.

Yet other companies, such as Helix, a maker of high-performance vacuum pumps, rely on their suppliers to provide extra capacity. Helix used Demand Flow Technology, in part, to segment its manufacturing processes into short, easily taught steps that in an emergency can be transferred to others. Having analyzed the capacities and capabilities of suppliers, Helix knows that it could quickly teach certain suppliers to make its products.

Page Break

Shadow Flights

Every veteran business traveller knows the frustration of a delayed or cancelled flight. But some airplane operators face especially large consequences for disrupted flights. If the corporate motto is "When it absolutely, positively has to get there overnight", then a cancelled flight is not an option. Even a delay of a few hours may mean that someone's package will miss a crucial deadline, disrupting a customer's business and possibly leading to customer losses well beyond the price of shipping an overnight package.

To avoid such problems, FedEx invests in an unusual type of redundant capacity. Every night, two completely empty planes take off - one from an East Coast airport and one from a West Coast airport - and fly to Memphis. In the wee hours of the morning, these planes make a similarly lonely return journey. In the event of a problem anywhere in the United States, these planes can swoop down, land, and pick up packages from a grounded aircraft. About forty more planes are flown deliberately half-empty for the same purpose. Finally, 14 planes (10 in the United States and 4 overseas) serve as spares on the ground that could be pulled into emergency service. Although these planes could not mitigate a major disaster, this fractional redundant capacity does alleviate many potential disruptions.

Inspecting Your Own Faults, Quickly

Located on the quake-prone West Coast, Intel's Oregon chip-making plant is vulnerable to an earthquake. The cost of downtime for this one plant is measured in hundreds of thousands of dollars per hour, putting a premium on fast recovery. The structure itself is built to the highest standards and can withstand most earthquakes, but the plant cannot open for business following an earthquake before inspectors check for hidden damage and dangerous structural faults. Worse, in the aftermath of an earthquake these inspectors would be terribly overworked. With thousands of buildings to inspect, and priority given to facilities like hospitals and schools, Intel knows it would face a three-day wait before being allowed back in its own buildings.

To save itself millions of dollars from the added downtime, Intel trained its own dedicated building inspectors for post-earthquake duty. Rather than rely on and wait for government-provided inspectors, the company can immediately inspect and certify its own buildings to reduce the delay. As a side benefit to the community, Intel plans to let these inspectors help the government examine and certify other buildings after Intel's are inspected. For Intel, the cost trade-off of the redundant training versus the severe costs of downtime creates an obvious business case for employing the inspectors.

Redundant IT Systems

Before September 11, 2001, the area in and around the World Trade Centre served as an information nexus for many financial services firms. Companies like Merrill Lynch, Smith Barney, Morgan Stanley, and Deutsche Bank all had major trading installations supported by a massive information technology infrastructure in and around the WTC. After the terrorist attacks on the towers, and the towers' subsequent collapse, some 20 million square feet [1.9 million square-metre] of offices were destroyed or rendered unusable and the entire local information technology infrastructure lay in ruins.

When the south tower of the World Trade Centre collapsed on Deutsche Bank's New York facility, the German banking giant lost a major connection to the US markets. Despite the loss, COO Hermann-Josef Lamberti said: "We were able, on the very same day, to clear more than $US300 billion with the Fed." Redundant IT systems in Ireland took over when the New York systems were destroyed.

Other firms, such as Merrill Lynch, also quickly shifted operations to backup centres and redundant trading floors near New York City. According to Paul Honey, Merrill Lynch's director of global contingency planning, "within just a few minutes of the evacuation, Merrill Lynch was able to switch its critical management functions to their command centre in New Jersey". Moreover, everyone in the company knew of the redundant facility and was trained to call in or transfer their work to that location.

In businesses that transact billions of dollars a day electronically, building full redundancy is not a difficult decision. But the same is true for most modern corporations: The loss of their information systems means loss of the business. As compared with other redundancies, keeping redundant databases with shadow transactions and redundant application systems is relatively inexpensive given the potential damage from loss of data or the information technology infrastructure.

Redundancy As a Resilience Strategy

Redundancy of any kind helps companies continue serving their customers while rebuilding after a disruption. Indeed, most companies are accustomed to protecting themselves against small fluctuations, mostly in the demand for their products, by keeping spare inventory.

But over the last two decades, many companies have worked diligently to cut costs by reducing exactly this type of inventory, resulting in tightly connected supply chains and higher quality of products and services. Thus, when creating a special safety stock for protection against high-impact/low-probability events, companies should take care not to reverse the gains of such "lean" supply chain operations. In fact, some companies may decide to keep a lean supply chain with little inventory and a single supplier, even for a critical parts. Their rationale is that, on balance, the full cost of coordinating several suppliers and keeping safety stock may be judged to be too high. That was Toyota's consideration following the Aisin fire described in chapter 13.

Yet safety stocks are a part of most resilience and business continuity plans. Even a relatively small amount of inventory can provide a disrupted company with time to prepare its response. A high level of redundancy, however, may be too expensive. Only when the stakes are especially high and the costs of extra capacity are relatively low, as in the case of information technology, should companies keep complete redundant capacity.

Service companies, in particular, typically keep extra capacity because the costs of service failures are high - service is what these companies sell. Furthermore, service companies cannot keep an inventory of their product (eg, if a package was not delivered on time, that service cannot be recovered and offered to the customer later). Consequently, a disruption will lead to an immediate service failure unless there is extra capacity or some other redundancy or flexibility in the system ready to kick in when the service is about to fail.

Page Break

Resilience through Flexibility

Regardless of how it is used, redundancy entails additional costs to any enterprise. It reduces efficiency and therefore is not congruent with management's goals and objectives. With quality processes, continuous improvements, and "six sigma" programs all aimed at reducing waste and redundancy, it is difficult to argue for more redundancy. At best, redundancy can be looked upon as a "necessary evil", an insurance against risk. But with managers motivated by competitive pressures and by short-term Wall Street expectations, the result is likely to be insufficient reserves.

Operational flexibility, on the other hand, can also increase resilience, allowing a company to respond quickly to disruptions. Such capability is more difficult to develop than simply keeping extra inventory, having more suppliers, or keeping extra capacity, since it typically involves fundamental changes to the entire company as well as its supply chain relationships. It involves close partnerships with suppliers, who can be called upon to help; flexible contracts, allowing for changes in quantities and delivery schedule; flexible manufacturing facilities that can be used to produce multiple products; a multi-skilled work force with empowered employees who can move quickly from one task to another; and strong customer relationships ensuring continuity in troubled times. The rest of this book discusses these and other aspects of such corporate flexibility.

Reprinted by permission of The MIT Press. Excerpted from The Resilient Enterprise: Overcoming Vulnerability for Competitive Advantage. Copyright 2005, by Yossi Sheffi; All Rights Reserved.