On Wednesday, a section of global users faced technical issues with Facebook, WhatsApp and Instagram. The partial outage frustrated users of the world’s largest social network.
“The disruption appears to be related to an internal infrastructure or application issue,” according to some experts and the social media company said it was working to fix the problem and is still investigating the overall impact of the issue. But the damages have been done.
Unable to upload or send photos and videos, many took to Twitter to complain. The outages lasted about one complete day, and regarded as Facebook’s longest downtime ever. More than 14,000 users reported issues with Instagram, while more than 7,500 and 1,600 users complained about Facebook and WhatsApp, according to outage tracking website Downdetector.com.
This could be just one of those instances of social media outages, but as the popular saying goes, “Don’t ever let a good crisis go to waste.” Going by this adage, there are lessons CIOs and IT leaders can learn from the very public problems faced by these technology majors.
Here are some takeaways for technology leaders from the recent Facebook glitches and other recent outages.
Go for regular disaster checkup, planning
While system failures are common and understandable, as the head of technology, it is your responsibility to be proactive about your planning, checkup and evaluation. Much of this disaster planning depends on what type of service you provide. If you’re a CIO or CTO responsible for maintaining email service to 1,000 employees, your disaster plan will look different than a technical team that services 500,000 external customers. Therefore, it is important to understand how outages will impact different areas of your business.
Knowing the mitigation costs, as well as backups cost and standby systems costs, make sense for disaster planning. As a tech leader you should also mark “mock failures” on your calendar and inform everyone involved on the given outage what responsibilities people have. He or she should take the opportunity to engage all stakeholders without the pressure of a real outage.
Paying attention to incident response planning
Any company can get compromised despite there being huge security teams working on them. In such a scenario, Partha Sengupta, Vice President-IT Shared Services at ITC, mentions that incident response planning will define a company’s survival after a breach and is therefore of prime importance.
“It is vital how fast an organization recovers from an attack,” he says, adding that the CIO (in some firms the CISO) is accountable to respond from a technology perspective. Therefore, they are going to be strong constituents and strong collaborative partners with others in the C-suite before a disaster strikes and also when an incident occurs.
Communication is the key
When in doubt, ‘communicate’ it out is the mantra for CIO/CISOs during an outage. Instead of simply fixing the issue during an outage, it is advisable to communicate the matter to the other stakeholders. Don’t forget there are other stakeholders in the issue, depending on whether your outage is internal, external or both.
“If you run a service for customers, they deserve to know what’s going on and to receive an estimated time to service restoration,” Anil Kuril, CISO at Union Bank of India opines.
In such cases, he believes that communication can’t be an afterthought. It must be a high priority, next only to resolving the outage.
Run your Backups more frequently
While most businesses understand the importance of backing up their important documents and files, many don’t create a backup of their entire server, believes Shyamol B Das, Chief Digital Officer (CDO) at BRAC Bank Limited, Bangladesh. “What they don’t realize is that having a backup of your vital data won’t help much if you need to rebuild your server from scratch.
Without a complete image of your server, the entire server settings can be lost in the event of a server crash. In this situation, it could take more than a week to restore your server to working order, especially that of installing the operating system, applying patches and updates, recreating file permissions, and setting up the email server, to name a few. In other words, it disrupts the regular work flow of the organization.
One way CIOs can prevent this by regularly using your backup systems as production systems. They can schedule times to move regular load to the backup systems. Das advises that while a system outage occurring in front of your eyes can be the worst thing for you and your company, you can at least be assured that when outages attack, you’re prepared, confident and responsive so as to avoid making a bad situation worse.
Tim Mackey, technology evangelist, Synopsys, believes that CIOs should be looking at the implications of these outages impacting Facebook, Gmail and YouTube that have been occurring in recent weeks and apply the best practices of security and privacy in their organizations.
“When an outage occurs, like the recent Facebook outage, C-suite shouldn’t take for granted that the security of its information is protected and should take the opportunity to both reset our passwords used on social media platforms and to revoke and reauthorize our access tokens issued by those same platforms,” he tips, adding that doing both of these items will minimize the chances of a malicious group benefitting from any service outage and gaining access to one’s personal data.