Friday, 03 March 2017 06:31

Human error caused Amazon Web Services outage Featured

By

A wrong command entered by a member of its technical staff was responsible for the outage experienced by Amazon Web Services simple storage service this week.

In a detailed explanation, the company said the S3 team was attempting to debug an issue that caused a slowdown in its billing system when, at 9.37am PST on Tuesday (4.30am Wednesday AEST), one of its technical staff ran a command that was intended to remove a few servers from one of the subsystems used by the S3 billing process.

The worker entered one wrong input for the command and ended up removing a much larger number of servers than intended, some of which supported two other S3 subsystems.

Among these were the index subsystem that manages the metadata and location information of all S3 objects in the Northern Virginia (US-EAST-1) Region and is needed to serve all GET, LIST, PUT, and DELETE requests.

The outage took down a large number of websites and services. AWS services something close to 1% of the top million websites.

The placement subsystem that allocated new storage was also disconnected; it depends on the index subsystem to be working for its own system to function.

"The placement subsystem is used during PUT requests to allocate storage for new objects. Removing a significant portion of the capacity caused each of these systems to require a full restart," the company said.

"While these subsystems were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable."

The company also said it had been unable to update the AWS service health dashboard for two hours after the issue began, because the administration console has a dependency on Amazon S3.

"Instead, we used the AWS Twitter feed (@AWSCloud) and SHD banner text to communicate status until we were able to update the individual services’ status on the SHD. We understand that the SHD provides important visibility to our customers during operational events and we have changed the SHD administration console to run across multiple AWS regions."

The company outlined several changes it would be making as a result of the outage and apologised for the breakdown. "While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further," it said.


Subscribe to ITWIRE UPDATE Newsletter here

PROMOTE YOUR WEBINAR ON ITWIRE

It's all about Webinars.

Marketing budgets are now focused on Webinars combined with Lead Generation.

If you wish to promote a Webinar we recommend at least a 3 to 4 week campaign prior to your event.

The iTWire campaign will include extensive adverts on our News Site itwire.com and prominent Newsletter promotion https://itwire.com/itwire-update.html and Promotional News & Editorial. Plus a video interview of the key speaker on iTWire TV https://www.youtube.com/c/iTWireTV/videos which will be used in Promotional Posts on the iTWire Home Page.

Now we are coming out of Lockdown iTWire will be focussed to assisting with your webinatrs and campaigns and assassistance via part payments and extended terms, a Webinar Business Booster Pack and other supportive programs. We can also create your adverts and written content plus coordinate your video interview.

We look forward to discussing your campaign goals with you. Please click the button below.

MORE INFO HERE!

INTRODUCING ITWIRE TV

iTWire TV offers a unique value to the Tech Sector by providing a range of video interviews, news, views and reviews, and also provides the opportunity for vendors to promote your company and your marketing messages.

We work with you to develop the message and conduct the interview or product review in a safe and collaborative way. Unlike other Tech YouTube channels, we create a story around your message and post that on the homepage of ITWire, linking to your message.

In addition, your interview post message can be displayed in up to 7 different post displays on our the iTWire.com site to drive traffic and readers to your video content and downloads. This can be a significant Lead Generation opportunity for your business.

We also provide 3 videos in one recording/sitting if you require so that you have a series of videos to promote to your customers. Your sales team can add your emails to sales collateral and to the footer of their sales and marketing emails.

See the latest in Tech News, Views, Interviews, Reviews, Product Promos and Events. Plus funny videos from our readers and customers.

SEE WHAT'S ON ITWIRE TV NOW!

BACK TO HOME PAGE
Sam Varghese

Sam Varghese has been writing for iTWire since 2006, a year after the site came into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.

Share News tips for the iTWire Journalists? Your tip will be anonymous

WEBINARS ONLINE & ON-DEMAND

GUEST ARTICLES

VENDOR NEWS

Guest Opinion

Guest Reviews

Guest Research

Guest Research & Case Studies

Channel News

Comments