Accuracy

There can sometimes be a fine line between suspicion and guilt. Determining malicious or “good” activity can be a challenging task in today’s cyber world full of hidden and dark secrets. A system based on accuracy and thorough analysis of all evidence will lead to the true malicious actor.

Consider a popular way to illegally extract money from someone’s bank account for example. The malicious actor has created a malicious link to exploit a vulnerability using a Cross Site Forgery Request (CSRF) on a vulnerable banking site. The malicious actor makes sure that the victim clicks on the malicious link while logged onto his online banking account with the vulnerability. He thinks he is transferring $2,000 to pay the rent but this malicious link changes the request to have $20,000 from his account transferred to an anonymous bitcoin account. The money is laundered away before it can be traced.

This is where the accuracy of a Web Application Firewall (WAF) is critical. A WAF blocks activation of the malicious link before it reaches the bank site with the corresponding vulnerability.

Let us assume the bank has a WAF to prevent such attacks, but the bank is reluctant to have strict WAF controls in place as they fear false positives. They have had users in the past complain that they were wrongly blocked from accessing the site when the WAF was previously tweaked. The answer for the bank is an accurate WAF that lets normal users through and blocks only malicious attacks.

So you ask yourself, how is WAF accuracy determined? After all there are true positives, false positives, true negatives and false negatives (pause for a moment to think which is which!).​ If false positives are minimised (do not suspect the innocent guy), then we might not find all true negatives (nailing the guilty one). If false negatives are minimised (never let the bad guy get away) we might as a side effect end up suspecting (and blocking!) many innocent people.

In such a system, mathematicians have deduced the correct way to maximise correctness taking into account all four outcomes. This is called the Matthews Correlation Coefficient (MCC). (https://en.wikipedia.org/wiki/Matthews_correlation_coefficient):

MCC = (TP*TN)-(FP*FN)/SQRT((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN))

What an equation you may gasp! Well I initially did. However, there is a way to make it easier to understand. First note that TP are true positives, TN true negatives, FP false positives and FN false negatives. SQRT is the square root. Given that we are talking about a correlation coefficient, the size of the time series or data (i.e. lots of data with many false positives, negatives, etc) is critical as well to its relevance. Note also that a correlation coefficient is a real number between 1 and -1. We are seeking a correlation coefficient that is as close to 1 as possible in order to have an accurate system.

Let us analyse three cases based on some simple assumptions in order to better understand what this equation is actually representing.

Assumptions:

Case # Mathematical Assumptions Layman translation Corresponds to real world scenario 
1 FP, FN>>TP, TN Number of false positives and false negatives is much much larger than number of true positives or number of true negatives Worst situation as we are letting bad traffic through and blocking good traffic
2 TP FN TN FP True positives, false negatives, true negatives and false positives are all roughly the same value Not a very accurate WAF as it seems arbitrary and not making enough good decisions
3 TP, TN>>FP, FN True positives and true negatives are much larger than false positives and false negatives Ideal accuracy as bad traffic is being blocked and there is little if any good traffic being blocked

 

So now let us plug in these assumptions into the equation (MCC = (TP*TN)-(FP*FN)/SQRT((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN))) and see what results.

Case 1: MCC ≈ -(FP*FN)/SQRT(FP*FN*FP*FN))

Note that in order to get to the above approximation we have only included FN and FP since TP and TN are so much smaller they can be neglected!

Next step: Case 1 results in MCC ≈ –FP*FN / SQRT(FP2*FN2) ≈ -1

So in the worst scenario where FP and FN dominate the equation, we have a highly negative correlation coefficient that makes sense.

Case 2: MCC ≈ (TP*TN)-(FP*FN)/ SQRT((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN)))

But if TP, TN, FP and FN are all about the same then the dividend or number on top is approximately 0! Then we do not have to interpret the divisor and get 0 as the quotient.

So in the not ideal scenario, where all four outcomes are roughly equal, we have a 0 correlation coefficient, meaning the WAF in that case is not bad but also does not provide any real value.

Case 3: MCC ≈ (TP*TN)/ SQRT(TP2*TN2)

Note that in order to get to this approximation we have only included TN and TP since FP and FN are so much smaller they can be neglected!

Next step: Case 3 results in MCC ≈ TP*TN / SQRT(TP2*TN2) ≈ 1

So in the ideal scenario where true positives (letting the good traffic through) and true negatives (identifying and blocking the bad traffic) dominate the equation, we have a highly positive correlation coefficient, meaning a highly accurate WAF!

So with these cases based on assumptions we see that the MCC would work well indeed to determine the accuracy of a Web Application Firewall. It is much more advantageous to have a mathematically appropriate method to calculate accuracy. A system that has been tweaked to have a maximum MCC value ensures a low false positive AND a high true negative rate. This is much better than having the bank not use the WAF due to a high false positive rate or to have a low true negative rate allowing malicious users to get through.

So that is enough math for a single blog entry.  My plea is that:

  • you ask your WAF vendor how they calculate accuracy and therefore tweak their WAF! 
  • What is their rate of all four variables:  True and False Positives, True and False Negatives?
  • How much data has the vendor used to deternine the accuracy (remember the larger the data set the more reliable the correlation coefficient).

This article is brought to you by Enex TestLab, content directors for CSO Australia.

Join the CSO newsletter!

Error: Please check your email address.

Tags Web Application Firewall (WAF)Cross Site Forgery Request (CSFR)OpinionsAccuracyvulnerable bankingIT SecurityCSO Australia

More about CSOEnex TestLabindeedMCC

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Dr Claudia Johnson

Latest Videos

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

  • 150x50

    IDG Live Webinar:The right collaboration strategy will help your business take flight

    Speakers - Mike Harris, Engineering Services Manager, Jetstar - Christopher Johnson, IT Director APAC, 20th Century Fox - Brent Maxwell, Director of Information Systems, THE ICONIC - IDG MC/Moderator Anthony Caruana

    Play Video

More videos

Blog Posts

Market Place