Learning to avoid the next cloud outage

by Sohini Bagchi    Oct 11, 2013

robert healey

Public cloud is set to grow at a phenomenal rate. Despite this strong growth potential, large public cloud outages continue to remind businesses of the potential risks of cloud-based IT.In an exclusive interaction withCXOtoday, Robert Healey, Marketing Evangelist, APAC & Japan, Riverbed Technology mentions that the “design for failure” model in IT – where an organization learns from its previous IT systems failures, and plans for potential failure scenarios by designing new IT infrastructure – can work wonders in a cloud environment in order to minimize business risk and enable continuous IT operations.

How does designing for failure help organizations deal with IT outages?

Planning ahead for IT systems failure early on allows you to learn from mistakes and build an infrastructure that can withstand any public or private cloud failure. By taking preemptive measures early and often, outages can be addressed head on with minimal business continuity disruption or impact on the corporate bottom line. There are some common mistakes that often happen well before an outage occurs, if IT professionals don’t plan proactively and they don’t design for failure.  On the other hand, if systems are designed to handle failure situations, these mistakes can be overcome easily.

What are some of the most common mistakes that IT professionals commit during cloud outages?

According to me, there are three most common mistakes. These include:

-Assuming data is backed up:Organizations assume that someone else is protecting their information and do not plan backup or disaster recovery

-Relying solely on one cloud provider: Many companies entrust their entire infrastructure to one provider’s cloud, so when that cloud goes down, so do they.

- Avoiding redundancy:Organizations do not have duplicate copies of data, equipment and systems, on multiple computers or units in the data center, in which case a single failure could be a major disaster

How should one plan and design for failure?

Organizations could implement one or more of the following methods to design for failure:

Balance across Availability Zones: Large public cloud providers’ data centers are built across availability zones (AZs) and regions. The idea is that by having your application instances in separate AZs, if one zone goes down, users can be instantly redirected in real-time to another one.If the secondary zone is far from the end user, performance will suffer, but your IT services will still work.

Cloud balance: Similar to balancing across AZs, you can also balance across multiple cloud providers. In this service model, Application traffic is routed to individual clouds based on a number of criteria, including the performance currently provided by each cloud, the value of the business transaction, the cost to execute a transaction in a particular cloud, and any relevant regulatory requirements. This model also enables you to develop an application that is battle-tested across multiple cloud platforms.Cloud balancing requires application delivery infrastructure in place that is hyper-portable across clouds, so that all functionality implemented in your Application Delivery Controller (ADC) is available in all locations.

Add another cloud into the mix: For businesses that rely on public clouds, a private cloud can be a secret weapon in their armory of failure design and outage avoidance. Such businesses’ livelihoodsoften depend on the web and require high-scalability and elasticity, making the public cloud a fairly easy choice and often causing them to bypass private clouds all together. When designing for failure, however, adding some private cloud IT resources into the mix as a safety net in the event of a public cloud outage is a solid option.

When ‘designing for failure,’ is it better to start off with a public or private cloud?

It really depends on your business model and the industry you’re in. Organizations that have to adhere to strict compliance regulations or have a large number of mission critical applications that require enterprise-class infrastructure, such as financial or healthcare organizations, will probably opt for a private cloud. On the other hand, organizations that require high-availability, scalability and elasticity, such as an e-commerce or online gaming company, are more likely to start off in a public cloud.

What role does Application Delivery Control (ADC) play in building a globally resilient cloud?

To design for resilient IT services in failure-prone environments, the enterprise requires an application delivery solution that preserves the availability, performance and security of key business applications. The true value of an ADC today lies in the degree to which it can control how an application is delivered, across any environment – private, public or a hybrid of both, as you move to a truly distributed, resilient infrastructure.“Cloud-aware” ADCs have emerged as the right solution for virtualized and cloud-ready environments.

What benefits can organizations gain by exposing themselves to failure early on?

Exposing organization’s IT to failure early and often, and quickly learning from these incidents, will help IT leaders build a robust and dynamic infrastructure that can withstand any cloud failure. By taking preemptive measures, outages can be addressed head on with minimal business discontinuity or impact. Failing to understand failure, in data center and in the cloud will put critical apps and business at risk. Your data center can experience outages just much as the cloud. In the end, these outages will only impact businesses that have not considered failure scenarios into how they think about the cloud.