Staying alive after migrating to the cloud

Test, prod, poke and break, but even that won't stop every outage.

Multi-tenant cloud providers might promise greater resiliency, ‘five nines’ uptime and better security than some in-house managed infrastructure, but organisations would be wise not to assume the provider has covered all bases.

US movie streaming service Netflix, which began migrating its data centre to Amazon’s EC2 cloud in 2009, has gone well beyond Amazon’s dashboard to better understand the risks it faces.

Wanting to discover what would happen in the event of various disasters, the company has created a dozen automation tools it calls Monkeys to simulate chaos in the cloud and show what would happen to variously dependent systems in the event of “once in a blue moon” failures.

Latency Monkey, for example, simulates service degradation, Conformity Monkey finds and ousts sub-optimal instances, and Janitor Monkey hunts for wasted resources, while Security Monkey checks SSL and DRM certificates are valid and whether security violations or vulnerabilities exist.

The biggest 'monkey' is of course Chaos Gorilla, a rendition of its predecessor, Chaos Monkey. Like the gorilla name suggests, it simulates an outage of an entire Amazon availability zone to test whether Netflix can shift resources to another functioning zone without disrupting services. 

The company claims that the Monkeys gave it an “almost free” set of tools to automate resilience and security testing, but its efforts highlight some of the additional investments that could be required by moving infrastructure to the cloud.

And its efforts still could not prevent a two hour disruption of services this week. Netflix advised customers between August 9 and 10 that it was experiencing problems with its streaming service, which came a day after an Amazon EC2 zone suffered “connectivity issues” North America.

Carlo Minassian, chief executive officer of Australian network security specialist Earthwave was impressed with Netflix’s automation tools since it allowed the company to take AWS cloud performance measurements in its own hands and challenge assumptions about cloud provider reliability.

“Most organisations will assume their cloud provider has security covered,” he told

“After all, doesn’t the five 9’s mean close to no downtime at all? Doesn’t that mean next to no hardware problems and no security breaches? Does your cloud provider define how they measure uptime or availability?

Although the two mean separate things for the customer, vendors often "carelessly" interchange them.

"Uptime is a measure of whether the service is actually running; availability is a measure of whether the service is running and accessible," explained Minassian.

“There are a few among us who may have suffered an outage or two on the services offered by their cloud providers.”

Join the CSO newsletter!

Error: Please check your email address.

Tags Penetration testingamazonattackscloud securityfive ninesAutomation ToolsnetflixPayment Card Industry Data Security Standard (PCI DSS)Chaos GorillaLatency Monkeycloud providersConformity Monkey

More about Amazon Web ServicesC2EarthwaveEarthwaveetworkISONetflixSense of Security

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Liam Tung

Latest Videos

  • 150x50

    CSO Webinar: Will your data protection strategy be enough when disaster strikes?

    Speakers: - Paul O’Connor, Engagement leader - Performance Audit Group, Victorian Auditor-General’s Office (VAGO) - Nigel Phair, Managing Director, Centre for Internet Safety - Joshua Stenhouse, Technical Evangelist, Zerto - Anthony Caruana, CSO MC & Moderator

    Play Video

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

More videos

Blog Posts

Market Place