Ever since California passed SB1386, organization after organization has disclosed that critical data banks have been compromised by hackers, couriers or consultants. The causes range from lost backup tapes to lost laptops to network hacks. What most of these cases have in common is the lack of strong technical measures to protect data that is by its nature highly sensitive.
From these and other cases we've learned that many companies seem to believe they can adequately protect their information with a combination of locked doors, firewalls and access controls. The problem with this approach is that attackers can frequently bypass such protective mechanisms and send raw commands written in the Structured Query Language directly to your database server. This is called an SQL injection attack. For example, if you have a table called "Customers," the attacker might be able to send through the SQL command "Select * from Customers," to dump your entire customer database as its reply.
Although there are numerous proposals for ending these kinds of attacks--including fancy intrusion detection systems and governors that limit the amount of data that can be sent to a Web browser in response to an HTTP request--this column is about a variety of techniques that have been largely ignored but that show great promise.
All of the following approaches protect the data in the database against both outside attackers and malicious insiders. That's because these tactics work by either eliminating or scrambling sensitive information so that it no longer poses a security risk.
Option 1: Don't Collect Sensitive Data
The very best way to provide for the security of a database is to eliminate the large-scale collection of sensitive information in the first place. This is apparently less obvious than it should be. For example, many organizations still routinely collect Social Security numbers (SSNs)--or even worse, they use SSNs as their own employee or student identification numbers. Instead of using an identifier that has such a high potential for credit fraud and identity theft, it's far better for organizations to create their own randomly assigned 10- or 11-digit identification number. (And, indeed, any organization that deals with the public needs to have a provision for randomly generated numbers in any event, because not everybody has an SSN.)
Option 2: Get Rid of Sensitive Information Fast
For those who really must store sensitive data, make sure that the information is erased as soon as possible. For example, in many cases it is simply unnecessary to retain a customer's credit card number (CCN) after a transaction has been committed--perhaps you can just keep the last four digits after 90 days. Those who need CCNs for auditing purposes may be able to move those numbers to a secondary database server not connected to the Internet.
Option 3: Split It and/or Scramble It
Secret sharing, also known as secret splitting, is a clever technique that can be used to split a piece of confidential information between two or more parties so that it cannot be reassembled until a minimum number of those parties participate. With secret splitting you can divide CCNs among four databases and require that data be retrieved from at least three of them in order to recreate the CCNs. In the simplest implementation, a secret is simply split between two databases; both databases must be consulted to recreate the secret.
Although secret sharing was invented in 1979 by cryptographer Adi Shamir (the "S" in the RSA cryptography algorithm), the system was largely an academic curiosity until recently. With the rise in database break-ins and mandatory notifications, secret sharing may be looking more attractive for some applications.
Back in 2003 RSA Security introduced a technology called Nightingale that is supposed to make it dramatically easier for businesses to integrate secret sharing into already-existing applications. With Nightingale, a special server holds half of the secret and the organization's existing database holds the second half. Secrets such as credit card numbers or cryptographic keys are only recombined when they are actually needed for use; in other words, call center reps won't be able to browse through the data on a coffee break.
In some very special applications it is even possible to use a secret without putting it back together! This is called split-key cryptography, and Nightingale supports a version of it as well. Split-key cryptography is useful in applications where you absolutely, positively do not wish to have a chance of someone running off with your encryption key. Instead of reassembling the key to use it, part of the cryptographic calculation gets run on one computer with part of the key, then the document gets moved to a second computer where the second half of the calculation gets done with the second part of the key. This is pretty complicated stuff, but it's appealing in certain specialized applications (such as for organizations that want to run a high-value certification authority).
In many cases information can be hashed by a one-way function before it's stored in a database. Hashing data enables it to be used for certain purposes but effectively makes it impossible to get the data back out.
For example, the Unix password system uses hashed passwords to increase the operating system's overall security. Here's how it works: Instead of storing user names and passwords in the user database, Unix systems store user names and passwords processed with cryptographically secure one-way hashes such as MD5. When a person tries to log in to the Unix system, the operating system takes his password, hashes it and compares the result to the value stored in the database. If they match, the user is allowed to log in. But if an attacker breaks into the system and accesses the database directly, all the attacker gets is the hashes, not the actual passwords.
A few years ago Peter Wayner, an independent consultant and author who specializes in cryptographic applications, came up with a method for using one-way hash functions to protect other kinds of information stored in a database. For example, a database that includes the hash of a person's SSN still allows SSNs typed on Web forms to be validated, but such a database makes it virtually impossible for the database operator (or hacker) to browse the database and download a list of names and SSNs. That's because the simple SQL statement "Select * from Customers" would no longer return the customer SSNs--it would just return the hashes of those SSNs.
Wayner calls his approach "translucent databases," and it's good for a lot more than just storing SSNs. For example, you can use a translucent database to eliminate phone numbers, e-mail addresses, names, addresses and other kinds of sensitive information--while still giving people the ability to look up and use records that contain this information. In his book, Wayner shows how to use the translucent database technology to build a baby-sitter matchmaking application. Even though this database somehow contains a list of young teenage girls who are spending the evening in expensive houses with otherwise unguarded small children, the translucent database technology makes it essentially impossible to dump out that highly sensitive information. Even the data bank's own operators can't make it reveal its secrets. Public-key cryptography can be layered on top of these databases so that the sitter's cell phone number can be decoded by Mom and Dad but not by Uncle Ernie.
Option 4: Blow It Up
Probably my favorite system for protecting data in a database against browsing or large-scale downloading is a system called Vast that was developed at the Georgia Institute of Technology by David Dagon, Wenke Lee and Richard Lipton. Vast uses cryptographic techniques to dramatically increase the size of a database. A 5- or 10-gigabyte database can be inflated so that it takes 10 or 20 terabytes to store. Individual records can be accessed relatively quickly, but any attacker attempting to read all of the data immediately runs into scalability issues. And downloading random slices of the database won't reveal anything useful, because as Vast's creators put it, "a secret is broken into shares over a large file, so that no single portion of the field holds recoverable information."
The researchers described Vast in a paper called "Protecting Secret Data from Insider Attacks" presented at the 2005 Financial Cryptography conference. But when I spoke with Dagon, he said that he was having a hard time finding anybody who was interested in commercializing the research because the whole idea of storing gigabytes of data on terabytes of hard drives seemed so wasteful! People just couldn't seem to understand that the point of Vast is that the cost of a few dozen hard drives is almost inconsequential compared to the protection that they can provide against a very common attack.
Option 5: Encrypt Just Part of It
Organizations that are looking for something that's made it out of the research lab and into the marketplace would do well to look some of the emerging column-level encryption solutions, in which some information in the database gets encrypted while other information is left in the clear. Column-level solutions are now available for IBM DB2, Oracle, Microsoft SQL Server and even MySQL. These systems generally rely on either code within the application or a fancy proxy to encrypt data as it is written into the database and decrypt it when it is read back out. Column-level encryption isn't as secure as the other approaches described in this column because the decryption key is usually embedded somewhere within the application program or database. But it's certainly better than having no encryption at all.
Simson Garfinkel, PhD, CISSP, is spending the year at Harvard University researching computer forensics and human thought. He can be reached at email@example.com.