Patch and Pray

Early one Saturday morning in January, from a computer definitely located somewhere within the seven continents, or possibly on the four oceans, someone sent 376 bytes of code inside a single data packet to a SQL Server. That packet — which would come to be known as the Slammer worm — infected the server by sneaking in through UDP port 1434. From there it generated a set of random IP addresses and scanned them. When it found a vulnerable host, Slammer infected it, and from its new host invented more random addresses that hungrily scanned for more vulnerable hosts.

Slammer was a nasty bugger. In the first minute of its life, it doubled the number of machines it infected every 8.5 seconds. (Just to put that in perspective, back in July 2001, the Code Red virus concerned experts because it doubled its infections every 37 minutes. Slammer peaked in just three minutes, at which point it was scanning 55 million targets per second.)

Then, almost in no time, Slammer started to decelerate, a victim of its own startling efficiency as it bumped into its own scanning traffic. Still, by the 10-minute mark, 90 per cent of all vulnerable machines on the planet were infected. But when Slammer subsided, talk focused on how much worse it would have been had Slammer hit on a weekday or, worse, carried a destructive payload.

Talk focused on patching. True, Slammer was the fastest spreading worm in history, but its maniacal binge occurred a full six months after Microsoft had released a patch to prevent it. Those looking to cast blame — and there were many — cried a familiar refrain: If everyone had just patched his system in the first place, Slammer wouldn't have happened.

But that's not true. And therein lies our story.

Slammer was unstoppable. Which points to a bigger issue: Patching no longer works. Partly, it's a volume problem. There are simply too many vulnerabilities requiring too many combinations of patches coming too fast. Picture Lucy and Ethel in the chocolate factory — just take out the humour.

But perhaps more important and less well understood, it's a process problem. The current manufacturing process for patches — from disclosure of a vulnerability to the creation and distribution of the updated code — makes patching untenable. At the same time, the only way to fix insecure post-release software (in other words, all software) is with patches.

This impossible reality has sent patching and the newly-minted discipline associated with it — patch management — into the realm of the absurd. More than a necessary evil, it has become a mandatory fool's errand.

Hardly surprising, then, that philosophies on what to do next have bifurcated. Depending on whom you ask, it's either time to patch less — replacing the process with vigorous best practices and a little bit of risk analysis — or it's time to patch more — by automating the process with, yes, more software.

“We're between a rock and a hard place,” says Bob Wynn, CISO of the state of Georgia. “No one can manage this effectively. I can't just automatically deploy a patch. And because the time it takes for a virus to spread is so compressed now, I don't have time to test them before I patch either.”

With patching, the only certainty is that CISOs will bear the costs of bringing order to the intractable. In this penny-pinching era, other C-level executives are bound to ask the CISO why this is necessary, at which point someone's gonna have some 'splaining to do.

The Learned Art

Patching is, by most accounts, as old as software itself. Unique among engineered artefacts, software is not beholden to the laws of physics in that it can endure fundamental change relatively easily even after it's been “built.” Automobile engines don't take to piston redesigns post-manufacture nearly so well.

This unique element of software has contributed to (though is not solely responsible for) the software engineering culture, which generally regards quality and security as obstacles. An adage among programmers suggests that when it comes to software, you can pick only two of three: speed to market, number of features, level of quality. Programmer's egos are wrapped up in the first two; rarely do they pick the third (since, of course, software is so easily repaired later, by someone else).

Such an approach has never been more feckless. Software today is massive (Windows XP contains 45 million lines of code) and the rate of sloppy coding (10 to 20 errors per 1000 lines of code) has led to thousands of vulnerabilities. CERT published 4200 new vulnerabilities last year — that's 3000 more than it published three years ago. Meanwhile, software continues to find itself running evermore critical business functions, where its failure carries profound implications. In other words, right when quality should be getting better, it's getting exponentially worse.

Stitching patches into these complex systems, which sit within labyrinthine networks of similarly complex systems, makes it impossible to know if a patch will solve the problem it's meant to without creating unintended consequences. One patch, for example, worked fine for everyone — except the unlucky users who happened to have a certain Compaq system connected to a certain RAID array without certain updated drivers. In which case the patch knocked out the storage array.

Tim Rice, network systems analyst at Duke University, was one of the unlucky ones. “If you just jump in and apply patches, you get nailed,” he says. “Patching is a learned art. You can set up six different systems the same way, apply the same patch to each, and get one system behaving differently.”

Raleigh Burns, security administrator at St. Elizabeth's Medical Center, agrees. “Executives think this stuff has a Mickey Mouse GUI, but even chintzy patches are complicated.”

The conventional wisdom is that when you implement a patch, you improve things. But Wynn isn't convinced. “We've all applied patches that put us out of service. Plenty of patches actually create more problems — they just shift you from one vulnerability cycle to another,” he says. “It's still consumer beware.”

Yet for many who haven't dealt directly with patches, there's a sense that patches are simply click-and-fix. In reality, they're often patch-and-pray. At the very least, they require testing. Some financial institutions, says Shawn Hernan, team leader for vulnerability handling in the CERT Coordination Center at the Software Engineering Institute (SEI), mandate six weeks of regression testing before a patch goes live. Third-party vendors often take months after a patch is released to certify that it won't break their applications.

All of which makes the post-outbreak admonishing to “Patch more vigilantly” farcical and, probably to some, offensive. It's the complexity and fragility, not some inherent laziness or sloppy management, that explains why Slammer could wreak such havoc 185 days after Microsoft released a patch for it.

“We get hot fixes everyday, and we're loath to put them in,” says Frank Clark, senior vice president and CIO of Covenant Health Care, whose six-hospital network was knocked out when Slammer hit, causing doctors to revert to paper-based care. “We believe it's safer to wait until the vendor certifies the hot fixes in a service pack.”

On the other hand, if Clark had deployed every patch he was supposed to, nothing would have been different. He would have been knocked out just the same.

Process Horribilis

Slammer neatly demonstrates everything that's wrong with manufacturing software patches. It begins with disclosure of the vulnerability, which happened in the case of Slammer in July 2002, when Microsoft issued patch MS02-039. The patch steeled a file called ssnetlib.dll against buffer overflows.

“Disclosure basically gives hackers an attack map,” says Gary McGraw, CTO of Cigital and the author of Building Secure Software. “Suddenly they know exactly where to go. If it's true that people don't patch — and they don't — disclosure helps mostly the hackers.”

Essentially, disclosure's a starter's gun. Once it goes off, it's a footrace between hackers (who now know what file to exploit) and everyone else (who must all patch their systems successfully). The good guys never win this race. Someone probably started working on a worm into ssnetlib.dll when Microsoft released MS02-039, or shortly thereafter.

In the case of Slammer, Microsoft built three more patches in 2002 — MS02-043 in August, MS02-056 in early October and MS02-061 in mid-October — for related SQL Server vulnerabilities. MS02-056 updated ssnetlib.dll to a newer version; otherwise, all of the patches played together nicely.

Then, on October 30, Microsoft released Q317748, a nonsecurity hot fix for SQL Server. Q317748 repaired a performance-degrading memory leak. But the team that built it had used an old, vulnerable version of ssnetlib.dll. When Q317748 was installed, it could overwrite the secure version of the file and thus make that server as vulnerable to a worm like Slammer as one that had never been patched.

“As bad as software can be, at least when a company develops a product, it looks at it holistically,” says SEI's Hernan. “It's given the attention of senior developers and architects, and if quality metrics exist, that's when they're used.”

And then there are patches.

Patch writing is appropriated to entry-level maintenance programmers, says Hernan. They fix problems where they're found. They have no authority to look for recurrences or to audit code. And the patch coders face severe time constraints — remember there's a footrace on. They don't have time to communicate with other groups writing other patches that might conflict with theirs. (Not that they're set up to communicate. Russ Cooper, who manages NTBugtraq, the Windows vulnerability mailing list, says companies often divide maintenance by product group and let them develop their own tools and strategies for patching.) There's little, if any, testing of patches by the vendors that create them.

Ironically, maintenance programmers write patches using the same software development methodologies employed to create the insecure, buggy code they ostensibly set out to fix. Imagine that 10 people are taught to swim improperly, and one guy goes in the water and starts to drown. Do you want to rely on the other nine to jump in and save him?

From this patch factory comes a poorly written product that can break as much as it fixes. For example, an esoteric flaw found last summer in an encryption program — one so arcane it might never have been exploited — was patched. The patch itself had a gaping buffer overflow written into it, and that was quickly exploited, says Hernan. In another case last April, Microsoft released patch MS03-013 to fix a serious vulnerability in Windows XP. On some systems, it also degraded performance, by roughly 90 per cent. The performance degradation required another patch, which wasn't released for a month.

Slammer feasted on such methodological deficiencies. It infected both servers made vulnerable by conflicting patches and severs that were never patched at all because the SQL patching scheme was kludgy. These particular patches required scripting, file moves, and registry and permission changes to install. (After the Slammer outbreak, even Microsoft engineers struggled with the patches.) Many avoided the patch because they feared breaking SQL Server, one of their critical platforms. It was as if their car had been recalled and the automaker mailed them a transmission with installation instructions.

Confusion Abounds

The initial reaction to Slammer was confusion on a Keystone Kops scale. “It was difficult to know just what patch applied to what and where,” says NTBugtraq's Cooper, who's also the “surgeon general” at vendor TruSecure.

Slammer hit at a particularly dynamic moment: Microsoft had released Service Pack 3 for SQL Server days earlier. It wasn't immediately clear if SP3 would need to be patched (it wouldn't), and Microsoft early on told customers to upgrade their SQL Server to SP3 to escape the mess.

Meanwhile, those trying to use MS02-061 were struggling mightily with its kludginess, and those who had patched — but got infected and watched their bandwidth sucked down to nothing — were baffled. At the same time, a derivative SQL application called MSDE (Microsoft Desktop Engine) was causing significant consternation. MSDE runs in client apps and connects them back to the SQL Server. Experts assumed MSDE would be vulnerable to Slammer since all of the patches had applied to both SQL and MSDE users.

That turned out to be true, and Cooper remembers a sense of dread as he realised MSDE could be found in about 130 third-party applications. It runs in the background; many corporate administrators wouldn't even know it's there. Cooper found it in half of TruSecure's clients. In fact, at Beth Israel Deaconess Hospital in Boston, MSDE had caused an infestation although the network SQL Servers had been patched. But that's another story for another time.

When customers arrived at work on Monday and booted up their clients, which in turn loaded MSDE, Cooper worried that Slammer would start a re-infestation, or maybe it would spawn a variant. No one knew what would happen. And while patching thousands of SQL Servers is one thing, finding and patching millions of clients with MSDE running is another entirely. Still, Microsoft insisted, if you installed SQL Server SP3, your MSDE applications would be protected.

It seemed like reasonable advice.

Then again, companies take more than a week to stick a service pack into a network. After all, single patches require regression testing and service packs are hundreds of security patches, quality fixes and feature upgrades rolled together. In a crisis, upgrading a service pack that was days old wasn't reasonable. Cooper soon learned that Best Software's MAS 500 accounting software wouldn't run with Service Pack 3. MAS 500 users who installed SP3 to defend against Slammer had their applications fall over. They would have to start over and reformat their machines. All the while everyone was trying to beat Slammer to the workweek to avoid a severe uptick in Slammer infections when millions of machines worldwide were turned on or otherwise exposed to the worm that, over the weekend, remained blissfully dormant.

“By late Sunday afternoon, Microsoft had two rooms set up on campus,” says Cooper. “Services guys are in one room figuring out what to say to customers. A security response team is in the other room trying to figure out how to repackage the patches and do technical damage control.

“I'm on a cell phone, and there's a guy there running me between the two rooms.” Cooper laughs at the thought of it.

Repeat Mistakes

As the volume and complexity of software increases, so does the volume and complexity of patches. The problem with this, says SEI's Hernan, is that there's nothing standard about the patch infrastructure or managing the onslaught of patches.

There are no standard naming conventions for patches; vulnerability disclosure comes from whatever competitive vendor can get the news out there first (which creates another issue around whether vendors are hyping minor vulnerabilities in order to associate themselves with the discovery of a vulnerability — yet another story for another day). Distribution might be automated or manual; and installation could be a double-click .exe file or a manual process.

Microsoft alone uses a hierarchy of eight different patching mechanisms (the company says it wants to reduce that number). But that only adds to more customer confusion.

“How do I know when I need to reapply a security roll-up patch? Do I then need to reapply Win2K Service Pack 2? Do I need to re-install hot fixes after more recent SPs?” Similar questions were posed to a third-party services company in a security newsletter. The answer was a page-and-a-half long.

There's also markedly little record-keeping or archiving around patches, leaving vendors to make the same mistakes over and over without building up knowledge about when and where vulnerabilities arise and how to avoid them. For example, Apple's Safari Web browser contained a significant security flaw in the way it validated certificates using SSL encryption, which required a patch. Every browser ever built before Safari, Hernan says, had contained the same flaw.

“I'd like to think there's a way to improve the process here,” says Mykolas Rambus, CIO of financial services company WP Carey. “It would take an industry body — a nonprofit consortium-type setup — to create standard naming conventions, to production test an insane number of these things, and to keep a database of knowledge on the patches so I could look up what other companies like mine did with their patching and what happened.”

Rambus doesn't sound hopeful.

There won't be a formal announcement of the fact, and no one really planned it this way, but Slammer has become something of a turning point. The fury of its 10-minute conflagration and the ensuing comedy of a gaggle of firefighters untangling their hoses, rushing to the scene and finding that the building burnt down left enough of an impression to convince many that patching, as currently practiced, really doesn't work.

“Something has to happen,” says Rambus. “There's going to be a backlash if it doesn't improve. I'd suggest that this patching problem is the responsibility of the vendors, and the costs are being taken on by the customers.”

There's good news and bad news for Rambus. The good news is that vendors are motivated to try and fix the patch process. And they're earnest — one might say even religious — about their competing approaches. And the fervent search for a cure has intensified markedly since Slammer.

The bad news is that it's not clear either approach will work. And even if one does, none of what's happening changes the economics of patching. Customers still pay.

More or Less

There are two emerging and opposite patch philosophies: Either patch more, or patch less.

Vendors in the Patch More school have, almost overnight, created an entirely new class of software called patch management software. The term means different things to different people (already one vendor has concocted a spinoff, “virtual patch management”), but in general, PM automates the process of finding, downloading and applying patches. Patch More adherents believe patching isn't the problem, but that manual patching is. Perfunctory checks for updates and automated deployment, checks for conflicts, roll back capabilities (in case there is a conflict) will, under the Patch More school of thought, fix patching. PM software can keep machines as up-to-date as possible without the possibility of human error.

The CISO at a major convenience store retail chain says it's already working. “Patching was spiralling out of control until recently,” he says. “Before, we knew we had a problem because of the sheer volume of patches. We knew we were exposed in a handful of places. The update services coming now from Microsoft, though, have made the situation an order of magnitude better.”

Duke University's Rice tested patch management software on 550 machines. When the application told him he needed 10,000 patches, he wasn't sure if that was a good thing. “Obviously, it's powerful, but automation leaves you open to automatically putting in buggy patches.” Rice might be thinking of the patch that crashed his storage array on a Compaq server. “I need automation to deploy patches,” he says. “I do not want automated patch management.”

The Patch Less constituency is best represented by Peter Tippett, vice chairman and CTO of TruSecure. Tippett is fanatical about patching's failure. Based on 12 years of actuarial data, he says that only about 2 per cent of vulnerabilities result in attacks. Therefore, most patches aren't worth applying. In risk management terms, they're at best superfluous and, at worst, a significant additional risk.

Instead, Tippett says, improve your security policy — lock down ports such as 1434 that really had no reason to be open — and pay third parties to figure out which patches are necessary and which ones you can ignore. “More than half of Microsoft's 72 major vulnerabilities last year will never affect anyone ever,” says Tippett. “With patching, we're picking the worst possible risk-reduction model there is.”

Tippett is at once professorial and constantly selling his own company's ability to provide the services that make patching less viable. But many thoughtful security leaders think Tippett's approach is as flawed and dangerous as automated patch management.

“There's no place for that kind of thinking, to patch less,” says St. Elizabeth's Burns. “As soon as an exploit takes advantage of an unknown vulnerability — and one will — those guys will be scratching their heads. He's using old-school risk analysis. How can you come up with an accurate probability matrix on blended threat viruses using 12 years of data when they've only been around for two years?”

Add to this a sort of emotional inability to not patch — sort of like forgetting to put on your watch and feeling naked all day. Several CISOs described an illogical pull to patch, even if the risk equation determined that less patching is equally or even more effective.

There's also an emerging hybrid approach — which combines the patch management software with expertise and policy management. It also combines the costs of paying smart people to know your risks while also investing in new software.

“There's a huge push right when P&L captains are telling CISOs to keep costs down,” says Hernan. That might explain why the executive security ranks are far less enamored by the Patch Less/Patch More philosophies. The polar approaches haven't yet spurred CISOs to take sides so much as they've flummoxed them. Ambivalent confusion reigns.

Hernan says, “I can understand the frustration that can lead to the attitude of, 'Forget it, I can't patch everything,' but that person's taking a big chance. On the other hand, he's also taking a big chance applying a patch.”

“I don't have much faith in automated patching schemes,” says Rambus. “But I could be convinced.”

Georgia's Wynn is ambivalent too. “If you think patch management is a cure, you're mistaken. Think of it as an incremental improvement. I have to take a theory of the middle range,” he says vaguely.

Postscript

On Monday after Slammer hit, Microsoft re-released MS02-061 to cover up the memory leak and update ssnetlib.dll, and it was much easier to install. Of course, by then, Slammer was already pandemic. Microsoft itself was infected badly, prompting a moment of schadenfreude for many. ISP networks had collapsed; several root DNS servers were overwhelmed; airlines had cancelled flights; ATM machines refused to hand out money. In Canada, a national election was delayed.

And after all that, the patches had, at best, a miniscule mitigating effect against Slammer. What ended up preventing Slammer from worming its way into the workweek and causing even more damage, it turns out, was a rare and unusual gesture by ISPs. That same Monday, they agreed to cooperatively block Internet traffic on UDP port 1434, the one Slammer used to propagate itself. “That's what allowed us to survive,” says Cooper.

And surely, with ISPs blocking the door, companies would seize the opportunity to update, test and deploy the new patches. Or, if they felt up to it, they could upgrade to Service Pack 3. They could use the time to locate and patch all of their MSDE clients and, once and for all, kill Slammer dead.

Ten days later, when ISPs opened port 1434 again, sure enough, there was a spike in Slammer infections of SQL Servers. Six months later, in mid-July, as this story went to press, the Wormwatch.org listener service showed Slammer remained the most prevalent worm in the wild, twice as common as any other worm. It was still trolling for, and finding, unpatched systems to infect.

Join the newsletter!

Error: Please check your email address.

More about CERT AustraliaCompaqFinancial InstitutionsGood GuysMicrosoftPandemicRambusReFormatSoftware TodayTruSecure

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Scott Berinato

Latest Videos

More videos

Blog Posts

Market Place