HOME     MENU     SEARCH     NEWSLETTER    
NEWS & INFORMATION FOR TECHNOLOGY PURCHASERS. UPDATED 10 MINUTES AGO.
You are here: Home / Storage Networks / Lessons Learned in Dropbox Outage
Build Apps 5x Faster
For Half the Cost Enterprise Cloud Computing
On Force.com
Public-Cloud Lessons Learned After Dropbox Outage
Public-Cloud Lessons Learned After Dropbox Outage
By Jennifer LeClaire / NewsFactor Network Like this on Facebook Tweet this Link thison Linkedin Link this on Google Plus
PUBLISHED:
JANUARY
13
2014

The sky didn't fall but the cloud was dark over the weekend as Dropbox faced service disruptions that angered many users. The company reported its online storage service went down on Friday evening during scheduled maintenance and was back up and running about three hours later, with core service fully restored by 4:40 p.m. PT on Sunday.

So what happened? And what can we learn from the outage? Akhil Gupta, head of infrastructure at Dropbox, offered his insights in a blog post Sunday.

Gupta said Dropbox relies on thousands of databases to run -- and each database has one master and two slave machines for redundancy. The company performs full and incremental data backups and stores them in a separate environment. The trouble came during an operating system upgrade to some of Dropbox's machines.

What Really Happened?

"During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS," Gupta said. "A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-slave pairs were impacted, which resulted in the site going down."

Gupta assured users that their files were never at risk during the outage. These databases do not contain file data, he said, but are used to provide some Dropbox features, like photo album sharing, camera uploads, and some API features.

To restore service as fast as possible, Dropbox performed the recovery from its backups. Gupta said the company was able to restore most functionality within three hours, but the large size of some of the Dropbox databases slowed recovery, and it took until several more hours for complete restoration.

What Dropbox Learned

In response to the incident, Dropbox has added an additional layer of checks that require machines to locally verify their state before executing incoming commands. This, Gupta said, enables machines that self-identify as running critical processes to refuse potentially destructive operations.

"When running infrastructure at large scale, the standard practice of running multiple slaves provides redundancy. However, should those slaves fail, the only option is to restore from backup. The standard tool used to recover MySQL data from backups is slow when dealing with large data sets," he said. "To speed up our recovery, we developed a tool that parallelizes the replay of binary logs. This enables much faster recovery from large MySQL backups. We plan to open-source this tool so others can benefit from what we've learned."

What It All Means

So what does all this mean for cloud-based service users? We asked Charles Weaver, CEO of the International Association of Cloud and Managed Service Providers, for his take on the deeper meaning. He told us the Dropbox outage draws attention to the inherent risks and issues with public cloud services.

"Not just regarding security and privacy, but also with respect to transparency. When private cloud providers have outages, their customers usually have a better sense of accountability about what their cloud provider is doing and who is managing their data. Not so with public cloud," Weaver said.

"The important thing for businesses to realize is that cloud computing can come in many different flavors. There are consumer-grade and business-grade cloud providers, and it is important for organizations to assess their needs prior to selecting a cloud platform. This includes both data privacy and security requirements, which impact the type of cloud provider you choose."

Tell Us What You Think
Comment:

Name:

Brad T.:
Posted: 2014-02-13 @ 5:07pm PT
I really don't like using both...so cal who do you use??

Cal Towns:
Posted: 2014-02-05 @ 11:06am PT
While I completely agree that the owner of the data should encrypt and securely share it, why use separate services? I’ve been encrypting my own data for nearly five years now, and it’s actually offered by the cloud service that I’m with, so I don’t have to pick a cloud service and an encryption service. People need to start doing a little research about this stuff IMO. There’s a lot of really interesting technology available.

Tom Murphy:
Posted: 2014-01-15 @ 8:52am PT
At nCrypted Cloud (www.ncryptedcloud.com) we believe that encrypting and sharing securely are two actions that the owner of the data is responsible for and should do before allowing data to be stored in the Public Cloud

Like Us on FacebookFollow Us on Twitter
TOP STORIES NOW
MAY INTEREST YOU
ISACA® offers a global community of more than 115,000 IS/IT constituents in over 180 countries. We develop and deliver industry-leading certifications, education, research and business frameworks. We equip individuals to be leaders in the fast-changing world of information systems and IT - Learn More>
MORE IN STORAGE NETWORKS
Product Information and Resources for Technology You Can Use To Boost Your Business

NETWORK SECURITY SPOTLIGHT
An easily avoided security lapse -- failure to use two-factor authentication on a single server -- is being blamed for the massive computer breach that hit JPMorgan Chase this past summer.

ENTERPRISE HARDWARE SPOTLIGHT
Flying under the radar just before Christmas, HP has launched a new version of its Chromebook 14, most notable for its touch screen and full high-definition display, plus more powerful specs.

© Copyright 2014 NewsFactor Network, Inc. All rights reserved. Member of Accuserve Ad Network.