Sometimes the best defense is deletion

Big Data is viewed as a very good thing by most enterprises. With the right analytics, it can generate meaning and business value. But like with many things there can be too much of a good thing, say a number of Information Governance (IG) experts.

Their message is that enterprises need to do more than protect their data from theft or infection -- they need to get rid of some of it, for both economic and legal reasons.

Dumping data has a variety of names, so far, including defensible disposition, defensible deletion and active expiration. Barry Murphy, cofounder and principal analyst at eDJ Group, prefers defensible deletion (DD).

What is more important than the label, Murphy wrote in a post in eDiscovery Journal, however: "Companies can reduce costs and decrease risks by proactively getting rid of unnecessary information."

Murphy told CSO Online that it is true that the cost of storage, both on-premise and in the cloud, continues to decrease. "One could argue that the decreasing cost of storage combined with lower-cost information processing platforms like Hadoop makes keeping information in perpetuity economically viable," he said. "But the rate at which information grows is faster than the rate at which the cost of storage decreases. So much corporate information is either duplicate or unnecessary that the cost of retaining it is greater than that of getting rid of it."

Jim McGann, vice president of marketing for Index Engines, said in an interview with Government Technology last year that in the past five years he had seen organizations taking steps to "clean up the 'data lake' that has been generated."

[See also: The security risks and rewards of Big Data]

The motivation is legal as well as economic, he said. Until about 15 years ago, organizations could save anything and easily hide the content that could become a liability, bu he saidt that won't work these days. "Lawyers and judges are more tech savvy and they won't accept excuses about complexity and cost issues anymore," he said.

Barry Murphy agrees. "The cost and risk of eDiscovery can poke a giant hole in any economic assessment of information management costs," he said.

The rules governing electronic information are different than those for paper documents, since it usually includes metadata, which can be important as evidence. An example is the value of the date and time a document was written to a copyright case.

This doesn't mean a company can get rid of any electronic documents it fears might create a liability. But Murphy said federal Rules of Civil Procedure give companies a so-called "safe harbor" from liability for information deleted in accordance with standard operating procedures, "as long as a legal hold process is in place to stop deletion if information may be relevant to a litigation or regulatory matter."

Murphy said that in general, "any information assets that are duplicate or have no business value would fall into the pile of 'to be deleted.'" But he said too many organizations are not yet "mature" enough to put an accurate value on information. Instead, he said, they have "time-based retention policies."

"For example, many companies delete all email in an employee's inbox after 90 days. Any email the employee wants to keep longer need to be dragged to a central archive folder where the employee can access them beyond the 90-day period."

It is better, and much more defensible, he said, to have "legal hold management," which would be enough to convince a court that relevant ESI (electronically stored information) has been preserved. The standard is reasonable effort rather than perfection," he said.

Jim McGann said he recommends that companies start small. "[It] could be with purging ex-employee data, or determining what data has not been accessed in five years and could be migrated to less expensive storage such as the cloud, or can eventually be purged," he said.

But he said it still takes setting priorities. "The highest risk data environments are typically email servers and legacy backup tapes," he told Government Technology. "Email is the most common source of evidence produced for litigation and regulatory requests. Legacy backup tapes are a snapshot of everything, including email and files."

So, he recommends creating a data map that includes things like the age of the data, last accessed or modified date, owner, location, email sender/receiver and even sensitive keywords. "A data map will deliver the knowledge required to make 'keep or delete' decisions for files and email. An actionable data map can then help you execute on these decisions and defensibly delete what is no longer required, and archive what must be kept," he said.

Read more about data privacy in CSOonline's Data Privacy section.

Join the CSO newsletter!

Error: Please check your email address.

Tags applicationsdata miningsoftwarebig datadata protectionData Protection | Data Privacyinformation governance

More about CSOTechnology

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Taylor Armerding

Latest Videos

  • 150x50

    CSO Webinar: Will your data protection strategy be enough when disaster strikes?

    Speakers: - Paul O’Connor, Engagement leader - Performance Audit Group, Victorian Auditor-General’s Office (VAGO) - Nigel Phair, Managing Director, Centre for Internet Safety - Joshua Stenhouse, Technical Evangelist, Zerto - Anthony Caruana, CSO MC & Moderator

    Play Video

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

More videos

Blog Posts

Market Place