AusCERT 2013: Cloud-based scanner identifies new malware by its ancestry

Polymorphic malware may be good at evading signature-based scanning engines, but the application of advanced algorithms to terabytes’ worth of malware dumps is enabling one Deakin University PhD student to detect even new strains of malware by assessing their similarity to existing, known malicious code.

The approach is a departure from traditional signature-based antivirus approaches, which are easily defeated by the large volume of malware that modifies its structure or behaviour to avoid detection. Yet by feeding a scanning engine with massive volumes of new malware and collating the results in the cloud-based Simseer service, security researcher Silvio Cesare – who presented his research to AusCERT 2013 today – has been able to identify new malware strains by their heritage.

“Traditional antivirus is good at something, but signatures aren’t very good at detecting entire families of malware because if you change the malware a little bit, it changes the byte-level content of that malware,” Cesare told CSO Australia.

Rather than focusing on scanning at the byte level, Cesare’s technique looks at small ‘structures’ woven through the malware code, which are common to each family of malware. “Even moderate changes to the malware don’t change the structures very much,” he said. “Using structures, you can detect approximate matches of malware, and it’s possible to pick an entire family of malware pretty easily with just one structure.”

Cesare’s analysis translates these structures into a series of control-flow graphs with vectors that can be easily compared to the profile of other malware. To facilitate those comparisons, Cesare has built Simseer to track and categorise the heritage of what has become more than 50,000 strains of malware.

“The visualisation aspect takes a similarity matrix and draws you an evolutionary tree showing the relationships between existing and new code,” he explained. “The servers group samples into clusters, then scans an unknown sample. If it belongs to a known cluster, it’s probably evolved from that cluster.”

To maintain Simseer’s database, Cesare downloads raw malware code from the likes of open malware-sharing network VirusShare and other sources, with between 600MB and 16GB of data fed into his algorithms every night. The service runs on an Amazon EC2 cluster with around a dozen virtual servers; many submitted samples are stored online, and the analysis of new code is load-balanced between the systems.

This data is combed over to scan for similarities with known malware families and to identify new ones; new strains are only catalogued separately if they are less than 98% similar to an existing strain. This threshold has proven effective in furthering Cesare’s goal of grouping malware according to the similarities in their code.

The service is still in the experimental stages and will continue to be refined as well as potentially being commercialised in the future. For now, however, Cesare hopes it will give malware researchers and potential victims an important tool to help in their efforts to catch, analyse and combat new malware.

“I’m still letting people experiment with it to see how it helps them,” he explained. “But the idea is that if you give it a sample, and if it can find you 10 samples that are 90% of similar and eight of them belong to a particular family, it likely belongs to the same family. And, if you know the family that it belongs to, it’s much easier to understand the capability of that new malware.”

Join the CSO newsletter!

Error: Please check your email address.

Tags #Auscert2013

More about Amazon Web ServicesCSODeakin UniversityDeakin University

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by David Braue

Latest Videos

  • 150x50

    CSO Webinar: Will your data protection strategy be enough when disaster strikes?

    Speakers: - Paul O’Connor, Engagement leader - Performance Audit Group, Victorian Auditor-General’s Office (VAGO) - Nigel Phair, Managing Director, Centre for Internet Safety - Joshua Stenhouse, Technical Evangelist, Zerto - Anthony Caruana, CSO MC & Moderator

    Play Video

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

More videos

Blog Posts

Market Place