Data lakes security could use a life preserver

As big data initiatives gain steam at organizations, many companies are creating “data lakes” to provide a large number of users with access to the data they need. And as with almost every type of new IT initiative, this comes with a variety of security risks that enterprises must address.

As big data initiatives gain steam at organizations, many companies are creating “data lakes” to provide a large number of users with access to the data they need. And as with almost every type of new IT initiative, this comes with a variety of security risks that enterprises must address.

Data lakes are storage repositories that hold huge volumes of raw data kept in its native format until it’s needed. They’re becoming more common as organizations gather enormous amounts of data from a variety of resources.

The growing business demand for analytics is helping to fuel the move to large repositories of data. And data lakes are likely to take on even more significance with the growth of the internet of things (IoT), in which companies will gather data from and about countless networked objects.

“Businesses and consumers are creating data like never before,” says Mohit Aron, founder and CEO of data storage company Cohesity. “In turn, the number of siloed data lakes has exploded, meaning that enterprises are faced with the challenge of protecting separate security perimeters around each data lake.”

For many companies, the promise of data science “is something that simply cannot be overlooked,” says Roger Hockenberry, CEO and founder of Technology and Management Consulting firm Cognitio Group.

[ ALSO ON CSO: Can data lakes solve cloud security challenges? ]

“For the executive, the idea of gaining competitive advantage, unique insight and anticipatory intelligence is compelling,” Hockenberry says. “However, in order to generate these outcomes the data scientist is advocating for a data lake. This lake is a combination of proprietary, open source and other datasets that can be analyzed in unique ways.”

It can also be a major target for cyber criminals. “Hacks into data lakes are a continual threat, one that is exacerbated by the large number of data lakes that enterprises have,” says Aron, who as a former Google engineer and lead architect of Google File System 2 has helped build and maintain some of the biggest data lakes in the world.

Considering the high business value of these information resources and the growing risks, security and IT executives need to make data lake security a high priority. To begin with, there needs to be an understanding at the highest levels of the organization of the need to protect data stores to the greatest extent possible.

Unfortunately, this doesn’t always happen.

“The appeal of increased agility, reduced costs and removal of silos cause many organizations to jump head first into the data lake and ignore basic information governance best practices at their own peril,” says Jonathan Steenland, principal at Zyston CISO Advisory Services, where he is responsible for co-leading CISO advisory and consulting.

“Since data lakes are such a data rich target, hackers will prioritize their efforts at exploiting these types of technologies and the users who connect to them,” says Steenland, who previously served as CISO at Fujitsu.

Data lakes should be managed as a highly valuable corporate asset, Hockenberry says. “In many cases, executives look at this as a ‘tech problem,’” he says. “However, a data lake should be seen as corporate IP [intellectual property] and if someone gains access to it, they could see strategic information that could affect shareholder value, compromise [research and development], and reveal plans and intentions that can create issues for a company.”

The best way to address these issues is to understand what data the enterprise is collecting, how it’s being analyzed, protected and disseminated, Hockenberry says. Business, IT and security executives need to build data-centric risk management strategies to ensure information is protected no matter where it resides, he says.

Hackers, cyber criminals and other bad actors are sure to go after large data stores if they think there is something to gain from these resources and if they sense they are not adequately protected.

“Because of the data they contain, they may be seen as a great target—someone could steal much of the most important and sensitive data that a company owns by stealing the contents of a data lake,” Hockenberry says. As such, one of the biggest risks companies need to be aware of is ransomware, which brings the possibility of costly denial-of-service attacks. “The denial of use of corporate data can be far more damaging than simply stealing it,” he says.

The most important security functions with regard to data lakes are authorization and access. Research firm Gartner has warned companies not to overlook the inherent weaknesses of lakes. Data can be placed into a data lake with no oversight of the contents, Gartner analyst Nick Heudecker noted at the firm’s Business Intelligence & Analytics Summit last year.

Many data lakes are being used by organizations for data whose privacy and regulatory requirements are likely to represent risk exposure, Heudecker said. The security capabilities of central data lake technologies are still emerging, and the issues of data protection will not be addressed if they’re left to non-IT personnel, he said.

Many of the current data lake technologies on the market “don’t have fine-grained security controls that allow for multi-faceted control at the object level,” Hockenberry says.

The promise of data science and the data lake can only be realized by the free flow and joining of very large data sets. “This freedom creates opportunity, but is also harder to manage from a security perspective,” Hockenberry says. “Executives should ask questions about access, encryption, and tracking of data throughout its lifecycle in the enterprise.”

Organizations need to ensure that they have appropriate access and authorization controls, strong identity management and audit processes in place. Most importantly, they need a robust and well-tested incident response plan “that can quickly determine what, how much, and to what extent data has been compromised in the enterprise, and how to quickly restore not only functionality but trust in data once an attack has been successfully executed,” Hockenberry says.

Deploying data encryption where it makes sense is another key step. “Each data lake becomes an endpoint with unique vulnerabilities,” Aron says. “Data at rest should always be encrypted, without exception. Self-encrypting drives make it easier to ensure data is secure from the get-go.”

The recent string of high-profile hacks is serving to remind organizations that security should remain a top concern in any data architecture, Aron says. “The world is producing exponentially more data, and inevitably enterprises are creating more and more data lakes to house these new streams of data,” he says. “These disparate data silos create a headache for the security community because there are inevitably more doors for hackers to try and penetrate.”

It’s safe to assume that threats against data lake technologies will increase significantly as they become more mainstream, Steenland says. “However, the biggest threat will likely be insider threats due to inadequate deployment and configuration of these technologies,” he says.

All the more reason for executives to add data lakes to their list of key resources to protect.

“Companies should take the same types of steps as they would securing any type of data to include giving consideration as to who needs access to the data and how it will be used, ensuring strong access controls exist and logging is in place,” Steenland says. “Some level of information governance is still required, especially if the data includes regulated data.”

Join the CSO newsletter!

Error: Please check your email address.

More about CSOGartnerGoogleTechnology

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Bob Violino

Latest Videos

  • 150x50

    CSO Webinar: Will your data protection strategy be enough when disaster strikes?

    Speakers: - Paul O’Connor, Engagement leader - Performance Audit Group, Victorian Auditor-General’s Office (VAGO) - Nigel Phair, Managing Director, Centre for Internet Safety - Joshua Stenhouse, Technical Evangelist, Zerto - Anthony Caruana, CSO MC & Moderator

    Play Video

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

More videos

Blog Posts

Market Place