Data separation ensures privacy, security in eBay's petabyte-scale data warehouse

Director of the eBay analytics platform and delivery, Alex Liang.

In running one of the largest data warehouses in the world, online retailer eBay has faced down some unique challenges in delivering big-data analytics capabilities – not the least of which is ensuring that its more than 6,000 business users and analysts are tightly managed to prevent data privacy and security compromises.

Data management and security are a significant and ongoing issue for a global company that depends upon a data warehouse that would be inconceivable to most organisations: with 2 billion pages served every day, over 100 petabytes of raw data I/O processed into 50 to 100TB of new data and 1.5 trillion new records per day, data – and the actionable information extracted from it – is unquestionably the company’s lifeblood.

This massive volume of data – the largest data warehouse table has 3.5 trillion rows – is managed through a complex infrastructure built around a 7.5PB Enterprise Data Warehouse for transactional data; a 40PB database called ‘Singularity’ that manages behavioural data bout customers; and a 40PB Hadoop database that manages Web-crawled data, images and text.

Managing the data is hard enough, but eBay realised long ago that extracting business value from such massive volumes of data could only be accomplished by enlisting the help of business users and empowering them to manage their own data.

“eBay believes that everybody should be self-reliant instead of having the IT or data team to hold their hand every day,” director of the eBay analytics platform and delivery, Alex Liang, told the audience at this week’s Teradata Big Data Analytics Summit in Melbourne.

“Five years ago, people in IT were responsible for the dashboard definitions, and when people had any questions about them they would come to ask why this or that was there. Now, for any business portal there will be a business user to work with you, and that person should be the person to control the business metrics definitions.”

Delivering this business goal has been possible by spreading data analysis capabilities – delivered using a variety of tools including Ab Initio, Informatica, Oracle GoldenGate, UC4, BES and MapReduce – across a three-tiered analytics ecosystem that targets specific business reporting at specific business users.

Those tiers include DataHub (a collaborative analytics platform “to generalise the concept of analytics as a service”); QuickStrike (a dashboard-based framework to consolidate and manage thousands of different reporting views); and Metrics Explorer (which enables ad hoc analysis using business terms).

With increased user power, however, comes an increased need for control: while delegating control over these platforms has improved access to eBay’s massive production databases, the company has also had to lock down access to different forms of data to ensure the newly empowered users aren’t straying into areas where they shouldn’t be.

This has led to the classification of data into four categories.

At the low end of the scale is a General Access classification, which is managed through eBay’s existing account-creation policies and provided for general access within a day of a request being made.

The second layer is called the Restricted Data Mart, which controls access to the business-managed data marts that have been created and managed by business users. “They do not want other users to have access to their data marts or they won’t have the control,” Liang said.

For higher-level information, eBay enforces a Personalised Identification Information (PII) layer that requires individual authorisation for any access to data that contains specific information about customers. “If you really want access to that data you must have a very strong business reason,” Liang explained.

Finally, a fourth layer – called Restricted Information – involves credit card, bidding and other highly sensitive area. Access to this area is extremely difficult to get, tightly monitored and limited to “very few people” on a need-to-know basis.

By tempering its commitment to improving data access with realistic and enforceable controls around that access, eBay has worked continually to prevent its massive data assets becoming an unwieldy and potentially unmanageable security and privacy risk.

Maintaining this control, Liang said, is essential as users are increasingly empowered to bypass conventional IT-department controls and deal more directly with company assets in the future.

“The future will be self-service,” he said. “We must be sure that the people who are doing the innovation, and running the business, query the data directly instead of having people behind them to do that. That’s the only way these people can really do innovation: the more data you have, the more innovative you’ll be.”

Follow @CSO_Australia and sign up to the CSO Australia newsletter.

Join the CSO newsletter!

Error: Please check your email address.

Tags analyticsdata securityebaybig dataprivacy

More about CSOeBayInformaticaInitioOracleTeradata Australia

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by David Braue

Latest Videos

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

  • 150x50

    IDG Live Webinar:The right collaboration strategy will help your business take flight

    Speakers - Mike Harris, Engineering Services Manager, Jetstar - Christopher Johnson, IT Director APAC, 20th Century Fox - Brent Maxwell, Director of Information Systems, THE ICONIC - IDG MC/Moderator Anthony Caruana

    Play Video

More videos

Blog Posts