Security Market Segment LS
Saturday, 03 October 2020 08:03

Microsoft says software bug in deployment process caused recent outage Featured

By
Microsoft says software bug in deployment process caused recent outage Image by OpenClipart-Vectors from Pixabay

Microsoft has blamed a latent code defect — in other words, a software bug — in the Azure Active Directory backend service Safe Deployment Process system for the outage that its 365 services suffered between 21.25 UTC on 28 September (8.25am AEDT on 29 September) and 00.23 UTC on 29 September (11.23am AEDT ON 30 September).

The company said this bug caused the defective software to deploy directly into its production environment, bypassing its normal validation process.

In a detailed post about the incident, Microsoft said the issue resulted in customers encountering errors while performing authentication operations for all its and third-party applications and services that depend on Azure Active Directory for authentication.

Apart from these applications, those that were using Azure B2C for authentication were also affected.

Apple suffered an outage the day after Microsoft did, but has not issued any explanation as to the cause.

"Users who were not already authenticated to cloud services using Azure AD were more likely to experience issues and may have seen multiple authentication request failures corresponding to the average availability numbers shown below," Microsoft said. "These have been aggregated across different customers and workloads.

"Europe: 81% success rate for the duration of the incident.

"Americas: 17% success rate for the duration of the incident, improving to 37% just before mitigation.

"Asia: 72% success rate in the first 120 minutes of the incident. As business-hours peak traffic started, availability dropped to 32% at its lowest.

"Australia: 37% success rate for the duration of the incident."

It said service was restored for most customers by 00.23 UTC on 29 September (11.23am AEDT ON 30 September), adding that there had been infrequent authentication request failures after this as well, which could have affected customers until 02:25 UTC (1.25pm AEDT on 30 September).

Explaining the bug in detail, Microsoft said: "Azure AD was designed to be a geo-distributed service deployed in an active-active configuration with multiple partitions across multiple data centres around the world, built with isolation boundaries.

"Normally, changes initially target a validation ring that contains no customer data, followed by an inner ring that contains Microsoft-only users, and lastly our production environment. These changes are deployed in phases across five rings over several days.

"In this case, the SDP system failed to correctly target the validation test ring due to a latent defect that impacted the system’s ability to interpret deployment metadata. Consequently, all rings were targeted concurrently. The incorrect deployment caused service availability to degrade.

"Within minutes of impact, we took steps to revert the change using automated rollback systems which would normally have limited the duration and severity of impact. However, the latent defect in our SDP system had corrupted the deployment metadata, and we had to resort to manual rollback processes. This significantly extended the time to mitigate the issue."

The company apologised for the incident. "We sincerely apologise for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future," it said.


Subscribe to ITWIRE UPDATE Newsletter here

GRAND OPENING OF THE ITWIRE SHOP

The much awaited iTWire Shop is now open to our readers.

Visit the iTWire Shop, a leading destination for stylish accessories, gear & gadgets, lifestyle products and everyday portable office essentials, drones, zoom lenses for smartphones, software and online training.

PLUS Big Brands include: Apple, Lenovo, LG, Samsung, Sennheiser and many more.

Products available for any country.

We hope you enjoy and find value in the much anticipated iTWire Shop.

ENTER THE SHOP NOW!

INTRODUCING ITWIRE TV

iTWire TV offers a unique value to the Tech Sector by providing a range of video interviews, news, views and reviews, and also provides the opportunity for vendors to promote your company and your marketing messages.

We work with you to develop the message and conduct the interview or product review in a safe and collaborative way. Unlike other Tech YouTube channels, we create a story around your message and post that on the homepage of ITWire, linking to your message.

In addition, your interview post message can be displayed in up to 7 different post displays on our the iTWire.com site to drive traffic and readers to your video content and downloads. This can be a significant Lead Generation opportunity for your business.

We also provide 3 videos in one recording/sitting if you require so that you have a series of videos to promote to your customers. Your sales team can add your emails to sales collateral and to the footer of their sales and marketing emails.

See the latest in Tech News, Views, Interviews, Reviews, Product Promos and Events. Plus funny videos from our readers and customers.

SEE WHAT'S ON ITWIRE TV NOW!

BACK TO HOME PAGE
Sam Varghese

Sam Varghese has been writing for iTWire since 2006, a year after the site came into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.

Share News tips for the iTWire Journalists? Your tip will be anonymous

WEBINARS ONLINE & ON-DEMAND

GUEST ARTICLES

VENDOR NEWS

Guest Opinion

Guest Reviews

Guest Research

Guest Research & Case Studies

Channel News

Comments