Jonathan Penn, Director; Jan Sundgren, Industry Analyst
By now, most of us are all too familiar with spam. Officially defined as unsolicited bulk e-mail, spam is the electronic counterpart of the junk mail we receive by postal mail. Because the economics of e-mail are different, however, spam is a problem of entirely new proportions. Unlike postal mail, the cost of sending an e-mail is so small that it creates an incentive for spam senders ("spammers") to launch a vast quantity of e-mail solicitations. In recent years, the amount of spam has increased explosively, and some observers are now wondering if spam is going to overwhelm the Internet. Already, by some accounts, the amount of all e-mail purported to be spam is approaching 50 per cent and the volume is doubling every 12 to 18 months.
The expense of this deluge includes the costs of processing the spam, the potential though untested legal liabilities associated with it, and both the distraction and sheer irritation to recipients. Spam clogs up bandwidth and storage for enterprises as well as ISPs. It reduces employee productivity as users are forced to sift through dozens of unwanted messages, and it can result in staff accidentally deleting legitimate business e-mail. The issue of legal liability centres mainly on the problem of pornographic spam and its contribution to a hostile work environment. So far, there are no cases of organisations being held liable for failing to block pornographic spam, and this threat may be a hobgoblin created by vendors, but it may be only a matter of time before the lawsuit comes along. Legal experts note that liability may hinge on organisations failing to act in response to complaints, and complaints are mounting.
The problem has gotten so serious that industry groups and governments are taking action. AOL, Yahoo! and Microsoft have announced a partnership to develop guidelines for fighting spam and technical standards for enforcing those guidelines. Efforts are also underway by several marketing and technology organisations. Meanwhile, thirty or more US states have passed antispam laws of one kind or another. At the federal level, proposals have circulated for years, but the issue has heated up tremendously in the last several months, spawning several new bills.
Regardless of what antispam legislation or industry self-regulation ultimately emerges, however, it will not eliminate the spam problem. Without technological improvements that enable the definitive identification of spammers, forging of e-mail addresses will continue, thereby allowing spammers to ignore prohibitions. Moreover, spammers can always move to jurisdictions where the laws don't apply. The upshot is that, for the near term at least, a major portion of the battle against spam is going to be fought at the level of individual organisations, through the use of various products and services as well as basic policies that keep spam out of their networks.
On the policy front, measures include limiting the exposure of company e-mail addresses to harvesting by spammers: not putting employee e-mail addresses on Web sites in ways that can be scanned by search engines, and protecting e-mail servers from address harvesting "dictionary attacks". Organisations should instruct employees to be careful about divulging their e-mail addresses in Web discussion groups and other places where it may be harvested, and to avoid responding to spam unless there is reasonable assurance that the sender is a legitimate business that would honour opt-out requests. Since these measures do not solve the problem, however, the mainstay of an antispam strategy is likely to be one of the growing number of antispam products and services available.
These products use a bewildering array of techniques for fighting spam, some better than others. One of the oldest and simplest techniques is to block e-mail coming from an e-mail server that is used by spammers. Lists of spammers are maintained by both commercial and non-profit organisations, and many antispam filters hook into these lists, such as MAPS RBL, ORBS or SPEWS, to name a few of the hundreds of such services. However, this technique has proven to be heavy-handed and controversial. The accuracy and timely maintenance of the lists have been severely criticised and even legally attacked. In addition, non-spamming organisations are usually placed on these lists if they have open mail relays, which spammers frequently use to send their messages. There is even a special list service for open mail relays, the Open Relay Database, or ORDB.
Another basic technique is content filtering, which scans for certain key words in e-mail. An antispam product using this technique will often come with templates of words that are typically found in different categories of spam. A more sophisticated variant of this approach is lexical analysis, in which the context of words is considered as well. Both of these approaches are susceptible to the blocking of a large amount of legitimate messages ("false positives") and spam which gets through the filters ("false negatives"). As spam moves increasingly to HTML e-mail, with hot links and graphics, these methods will become ineffective.
A more accurate technique is spam fingerprinting, whereby specific spam e-mails are identified and a unique "fingerprint" is developed to allow scanners to find and block those messages. Fingerprinting has the advantage of yielding very few false positives, but its success at stopping spam hinges on the comprehensiveness of its fingerprint library and the timeliness of updates. New spam must be identified quickly, and corresponding fingerprints must be distributed to customers, so the challenge is similar to that faced by the pattern-matching approach for viruses. Brightmail is the leading vendor in this area, focusing almost exclusively on this technique by maintaining a wide network of "honeypot" mailboxes which attract spam. Other vendors, such as Cloudmark, SurfControl, CipherTrust and Trend Micro, use fingerprinting as one element in a multi-layered approach, relying on the technique as a way to delete a large amount of high-certainty spam without any further processing.
Another common technique is the use of heuristic analysis. Heuristics relies on a large number of rules which consider various aspects of an e-mail's header and content, typically assigning a score corresponding to the likelihood that the message is spam. Organisations can then establish thresholds for processing e-mail in different ways, such as deleting it immediately, quarantining it for inspection by users, or forwarding it with a warning. The rules can be continuously updated to stay ahead of spammers, but the effectiveness of heuristics depends on how good those rules are. The success of heuristics at blocking spam seems to come at the price of false positives. However, the sensitivity of heuristic filters can be adjusted by the customer to find an acceptable balance between false negatives and false positives. Vendors such as Tumbleweed, Trend Micro, SurfControl, Symantec, Network Associates, and ActiveState incorporate heuristics.
Bayesian filtering is a new technique gaining attention in spam fighting circles. This approach compares the words in messages against their statistical probability of appearing in spam vs. appearing in legitimate mail. The overall spam probability rating for an entire message is determined by combining the probability scores of several of the most "extreme" words: those which have a very high or very low probability rating. Bayesian filtering factors in not just the bad "spam-like" characteristics when making its assessment, but also the good "un-spam-like" characteristics. On the other hand, it does require training, both an initial effort and ongoing training, to teach it to detect the evolving spam that it misses.
These techniques for fighting spam can be supplemented with several additional measures. White lists allow e-mail from certain addresses to bypass or override spam filters altogether, reducing false positives and processing loads. Corporate-defined black lists (as opposed to black list services) also serve a complementary role. Traffic pattern analysis helps identify a mass spamming event in the network, and reverse domain name system (DNS) lookups can help expose spammers that are spoofing e-mail addresses. Finally, techniques for preventing e-mail address harvesting are available in some solutions.
An alternative approach is to use a challenge-response system, in which e-mail senders must prove that they are legitimate before their e-mail is delivered. This usually involves checking whether senders are actual people by having them respond to a request that only a person could understand, since spammers programmatically send their messages using short-lived e-mail accounts. Senders responding correctly are added to a white list. While this approach essentially obviates the need for other types of antispam measures, it generate a lot of extra traffic and obviously places a burden on a new e-mail sender. It is far better suited to consumers, who communicate in well-defined and fairly static communities, than to businesses.
None of these techniques alone will be sufficient to reduce spam to manageable proportions. Corporate measures must be multi-pronged, and the available solutions reflect this consensus. Does a judicious combination of approaches yield an adequately effective solution? Many vendors claim impressive results, but the answer is unclear. Furthermore, spam is continually evolving to evade filters, so approaches that work today may not work so well next year. Given the relative immaturity of the market and the changing nature of spam, the effectiveness of different solutions is going to vary and customers will need to pay close attention to the techniques deployed by each product.