Today, CIOs are investing considerable resources in systems that provide them with better visibility into, and real-time reporting on, exactly what is happening on their IT systems.
Performance monitoring tools such as Compuware Application Performance Management, CA Performance Monitoring, Nimsoft and many others provide excellent real-time visibility reporting, yet fail to provide a key understanding of many security threats and events.
Log analysis and SIEM (Security Incident Event Management) technologies such as HP ArcSight, LogRhythm, Splunk and many others are becoming a more common means of detecting unusual and suspicious behaviour within a network, but fail to deliver an understanding of actual user activity.
Today, human behaviour is the key factor which many organisations are striving to handle and manage – from common misconfiguration errors to trusted third- party vendors to rogue employees.
As such, CIOs are investing in high-availability systems and performance monitoring solutions, but are challenged to follow best practice procedures to address human errors during activity on organisations key data assets.
Oddly enough, the simple question, “Who did what, when, where and how on the server?” remains one of the toughest questions for CIOs to answer. This is despite the variety of system management tools in use today. It is simply not enough for administrators to just monitor servers and applications when the number one cause for server downtime is human error!
To achieve efficient operations and rapid remediation when problems arise, CIOs and administrators need access to a holistic view of the entire IT infrastructure, including granular monitoring of every human action on company servers.
The challenge of monitoring human activity
According to Ponemon Institute's 2013 Cost of Data Breach Study: Global Analysis, negligent employees and contractors are the root cause for 35 per cent of data breaches. Although "malicious attacks are more costly globally”, costing companies an average of US$157 per record breached (adding up to millions of dollars per data breach), human error doesn’t fall too far behind, costing an average of US$117 per record breached.
Human errors differ from system errors in an important way. For system errors, the tedious process of log analysis has been somewhat alleviated by the adoption of system monitoring platforms (e.g., SIEM) and software profiling utilities. But IT security administrators typically have no means of discovering the human errors which cause data breaches and system failures.
An additional consequence when human error causes a data breach or system failure is that smart users make the “smartest” mistakes. They know the nooks and crannies of arcane configuration files that might tweak an extra 5 per cent of performance out of the system. These hidden corners are subsequently the most difficult to identify as the cause when something goes wrong.
Another difference between human error and system errors is repeatability. For system errors, finding the problem is equivalent to fixing the problem. If, for example, your troubleshooting process leads to a conclusion that a NIC card is not working, then swapping in a new card closes the issue, and you can sleep well that night.
Human errors are not this direct. If you find a suspiciously modified configuration file and swap it with the correct configuration file, the problem may be solved temporarily. But when you go to bed that night, you’re probably still scratching your head, wondering who or what caused this error, and whether it will happen again tomorrow.
Server surveillance and people auditing
Ultimately, CIOs need to understand the importance of implementing a solution for human activity auditing. Such a solution must provide visibility into all user actions performed on every server, whether the server was accessed through the local console or any method of remote access (Terminal Services, Citrix, VMware, Remote Desktop Connection, LogMeIn, GoToMyPC, PC Anywhere, etc).
Ideally, a people auditing solution must provide three levels of activity data:
1. A video recording of all on-screen user actions
2. A summary journal of what each user did (allowing fast review even by non-experts)
3. Searchable text-based activity logs (including the names of windows opened, applications run, URLs viewed, mouse clicks made, text entered or edited and even unseen commands executed by scripts).
Since watching thousands of hours of recorded video is not practical, the summary journal and searchable log capabilities are critical. Furthermore, each journal entry and search result must link directly to the moment of the video where that action occurred so that administrators and trouble-shooters can actually see exactly what the user did at any point of interest.
Watching a video showing exactly what actions a user performed removes all doubt about what might have caused a certain system configuration modification or other change. This provides fast and unambiguous troubleshooting and root cause analysis.
With human error being responsible for 56 per cent of server outages, it is vital that CIOs, IT security staff and administrators have a solution allowing them to quickly and accurately review exactly what users did on servers and system devices. With this knowledge, they can rapidly discover the error, repair the damage, confront the culprit and implement procedures to prevent similar occurrences in the future.
People auditing in a nutshell
Most IT organisations today utilise system monitoring platforms that are efficient for system error troubleshooting, but are ineffective when diagnosing human-generated errors. These human errors represent over half of all downtime and data loss and are best handled by focusing on the root cause: What was done on this critical system, when and by whom?
Answering this root-cause question will bring drastic improvements in troubleshooting effectiveness and will also enhance security and compliance robustness. But most importantly, it will provide CIOs with the understanding and visibility required to make effective decisions and achieve their desired strategic outcomes.