Punk Rocker or Terrorist? Don't Ask Soundex to Decide

The ageing technology used by Australian Immigration and airlines can't tell the surnames of punk rocker John Lydon ("Johnny Rotten" of the Sex Pistols) and Osama bin Laden apart, and Australia is more vulnerable as a result.

As the government pledges a $200 million airport security plan, encompassing the appointment of new airport police commanders and the integration of federal and state policing at all major airports, one company is warning of another pressing need: the adoption of knowledge-based technology for name recognition.

Jack Hermansen, CEO of Language Analysis Systems, in Australia to talk to government representatives last week, says a name query of the Soundex key-based database search system used by the government sector and many airlines matches the last name of the Sex Pistols' lead singer to that of the terrorist mastermind.

The failings shouldn't be too surprising: the Soundex algorithm was originally developed for punch cards for use on the so-called Hollerith Machine, used in the 1890 US census to "read" the cards by passing them through electrical contacts after the original census was lost to fire. This system proved so useful in statistical work that it paved the way for development of the digital computer. The algorithm was first patented in 1918. The software takes a name, strips out vowels and assigns codes to somewhat-similar-sounding consonants, such as "c" and "z", with sometimes freakish results. "Bin Laden" and "Lydon", for example have the same Soundex code - L350. So do Hencke and Hamza - H520.

The software is also culturally insensitive to names; treating short, three-syllable Asian names in the same manner as it treats Arabic or Hispanic names with as many as eight different syllables; ignoring the numerous variations of a name and being incapable of recognizing names that originate in another script, like Arabic or Asian which have multiple valid spellings in the Roman alphabet. For instance, there are more than 200 ways to spell Mohammed, and over 300 different variants to the name Moammar Gaddafi, but Soundex wouldn't know. Not surprisingly, the software produces worrying amounts of false positives.

Hermansen says the failings suggest Australian national security, our frontline in protection, currently has big holes that could be exploited by terrorists and criminals keen to get into the country and do harm.

To overcome those difficulties LAS has developed name recognition technology that is being used by Australian customs as well as US intelligence community and law enforcement in order to protect the United States.

"We've been doing this is mainly inside the federal government in sole source classified projects for about eighteen years," he says.

"The use of knowledge-based technology is much more effective in matching the variations of names that come from other cultures. Soundex and key based technologies are what you might think of as a one size fits all technology. You have one algorithm, no matter whether it's a Chinese name, Hispanic, Arabic. And our position is that you have to know how the use varies within each culture if we are to do an effective job of matching those varying names."

Australian Customs has become the latest buyer of LAS' The Name Reference Library (NRL), an interactive encyclopedia of culture-specific information about names, their use, their meanings, and their patterns of spelling variations, collected after detailed study of almost a billion names from every country in the world. Intended for use as a training and reference resource for users who must cope effectively with the confusing diversity of personal names, the NRL also provides an XML-based API, allowing its name-analysis functions to be integrated directly into many Web-enabled applications.

Join the newsletter!

Error: Please check your email address.
Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Sue Bushell

Latest Videos

More videos

Blog Posts

Market Place