​How to define cyber criminals’ effective patterns of attack machine learning

Research is in progress that will make it easier to find patterns of attack by cyber-criminals, and share results with the larger community. This means getting down into the details of the pattern concept asking such questions as:

  • What does an effective pattern of attack look like?
  • How do we find them?
  • What makes them effective?

The central quality of a pattern is its level of abstraction – a pattern is neither completely specific nor completely universal. The best patterns will be general enough to verifiably identify related attacks, regardless of difference in their easily changed details, without incorrectly matching legitimate activity.

For example, we know that simply tracking the hashes of known bad files is not sufficient to provide a durable defensive advantage. We can use patterns to widen our scope and consider context. For example, Notepad itself isn’t scary, but when it executes PowerShell, that’s cause for concern.

Generalising this insight further, we might recognise when any humble document-oriented application has initiated a process that carries capabilities beyond what should be necessary. We can alert when PowerShell is run by any application at all, but that’s likely to create a lot of harmful noise.

he patterns need to be general in the right way, too. For example, bad domains come and go, and blocking them isn’t very useful in the long term. Pattern B, on the other hand, is more general and more specific than this approach: it doesn’t consider what address notepad is connecting to. The fact that Notepad is trying to access the Internet at all is cause for concern. Once again, we may be able to generalise this insight to apply to entire classes of applications (C).

It is also easy to run foul by being too general. For example, some malware creates a backdoor binary that calls home, and alerting on binaries that follow this pattern (D) might seem like a good idea. But Adobe Acrobat and its link-enabled documents could match this pattern and generate a fountain of false positives.

Cb Response provides a wealth of data, and a skilled analyst will have little trouble tracing and describing the kill chain of a discovered attack. Turning this knowledge into accurate, precise and generally applicable patterns at scale is the opportunity in view.

Machine-driven pattern finding

Fortunately, there are countless machine-driven methods for deriving patterns, and many are specifically designed to find the best divisions between classes of behaviour. The Carbon Black R&D team is currently researching this; some promising general approaches include:

  • Sequence learning methods, which consider a set of events and find sequences that are highly correlated with particular outcomes (in our case, attacks).
  • Taint propagation algorithms, which can be used to learn about processes, binaries, and domains based on their association with known bad entities. When multiple sources of taint combine to make a given event worthy of suspicion, this represents a highly generalised pattern that can guide future investigations.
  • Graph-based methods, which can abstract behaviours, allowing them to be compared in terms of their overall nature, rather than their specifics.
  • Clustering, which can be used to find common classes of applications, broadening the applicability of known patterns.

These approaches serve to help us find patterns, improve the patterns we’ve found, and apply our patterns more broadly than the original pattern’s creator might have realised was possible. This is crucial if we are going to work together to share patterns. That description must be intelligently generalised into knowledge that can be applied anywhere. These methods can help, and effective intelligence is the key.

Note that machine learning can provide a crucial advantage in the game of cat-and-mouse between attackers and defenders. Adversaries are good at knowing how their opponents think. Patterns that are obvious to us are likely to be obvious to them. But machine learning gives us an inside track, finding patterns that are not intuitive to humans. If we can keep attackers guessing about how we’re detecting them, we can make them waste resources trying to evade us, until attacking is no longer an attractive proposition.

The human connection

Algorithms alone won’t resolve security concerns. Rather, collaboration between human analysis and computational power is crucial. This takes several forms, including:

Extrapolation: patterns discovered by analysts represent valuable input for machine learning algorithms, which can serve to find higher-order patterns in data.

Visualization: raw data and derived observations can be presented to the user for further review, leveraging humans’ powerful visual pattern-finding capabilities.

Validation: analysts can provide domain knowledge to validate machine-derived patterns. Conversely, machines can provide statistical support for human-derived intuition.

One crucial approach to this human-machine collaboration will be to move beyond the raw events in our data, and instead extrapolate higher-order behaviours. For example, suppose Process A creates a binary file. Later, that binary is executed and instantiates Process B. This represents an important relationship between Process A and Process B, one that might not be evident by glancing at the raw data stream.

Or maybe two processes communicate across a network. Or a binary could, via a sequence of intervening events, delete itself. These are interesting behaviours! By describing our event data in terms of these behaviours, we help to make the data more understandable and useful (for people and machines alike!). Our work aims to define these behaviours and express kill chains in these higher-order terms.

Understanding is crucial because the success of a pattern-based approach depends on human-to-human connection. It is not enough to build an algorithm that says: “Yes, it’s good” or “No, it’s bad.” We need knowledge that is understandable, usable and verifiable by other members of the community. By sharing this knowledge we can drastically increase the effort necessary to breach our collective defences.

Join the CSO newsletter!

Error: Please check your email address.

Tags hackersdatacyber criminalsinfosecAIhuman machine collaborationartifical intelligencemachine learning

More about Carbon Black

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Brett Williams

Latest Videos

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

  • 150x50

    IDG Live Webinar:The right collaboration strategy will help your business take flight

    Speakers - Mike Harris, Engineering Services Manager, Jetstar - Christopher Johnson, IT Director APAC, 20th Century Fox - Brent Maxwell, Director of Information Systems, THE ICONIC - IDG MC/Moderator Anthony Caruana

    Play Video

More videos

Blog Posts