One of the great challenges of incident response is knowing what the best course of action will be before you start reacting in anger during a security incident. Who should you notify and when? At what point should things be escalated to a more senior level in the business? Should you shut down systems or delete virtual machines?
The trouble with incident response is that it’s hard to know what the best action might be while you’re in the heat of battle. It’s one of the reasons that red/blue teams are a great strategy as you get experience in reacting and decision making.
Another way to improve your incident response strategy is to learn from the experience of others. Mike Wilkinson, a security consultant at Trustwave SpiderLabs, spoke at AusCERT 2014 about incident response. He drew on over 100 compromises to tell the audience at his presentation of the lessons learned from those breaches.
The first stage is to identify the scope of the breach. What systems, data and users are affected? Is the breach causing data loss or exfiltration? By determining the scope of the breach it’s then possible to work out the next step – communications and escalation.
Once a breach is detected, it’s important to know who you have to contact and when. He suggested the list includes legal, public relations, law enforcement, customers and, suppliers and service providers. It might not be necessary to tell everyone initially but the business needs to set rules about when each party is told.
Part of that contact plan needs to be a schedule of when affected parties will receive updates and advice about the compromise. This is an overlooked element of the communications plan but it is important. As well as keeping the business informed it will help reduce the number of interruptions the incident response team will face.
The detailed investigation of the breach requires that you look at both volatile and non-volatile storage. For that to work you may need to resist the urge to shut down breached systems immediately. If the breach is only present in memory, any chance of carrying out detailed investigation can be reduced. Similarly, deleting infected servers can result in the loss of forensic capability.
Wilkinson said there are tools and scripts that can dump data from memory and scan infected systems so that important data used in the forensic investigation can be retained before affected systems are taken offline of scrapped.
System logs are also critical. Wilkinson suggested that companies should be keeping logs for at least six months so they can look back after incident detection to know when a problem really started. Given that data from some security researchers suggests that many breaches remain undetected for an average of eight months, we’d suggest retaining logs for at least a year is a safe option.
When it comes to remediation, Wilkinson says that it’s much easier to make those decisions before the incident actually occurs. For example, if a workstation is infected with malware is the policy to wipe and rebuild, do you fix with anti-virus software or manually repair? Setting policies in place for key systems ahead of time will take the pressure off during the incident and reduce the risk of taking actions that may make things worse. This policy needs to include escalation points in case one action fails.
Wilkinson also emphasised the importance of testing plans out and validating policies and documentation before an actual breach occurs.