Category Sponsorship Banner Left

Category Sponsorship Banner Right

Category Sponsorship Banner Middle

Wednesday, 09 June 2021 11:57

The overnight internet outage - an update Featured

By
The overnight internet outage - an update Image by Jan Vašek from Pixabay

Internet CDN provider Fastly has admitted that last night's outage was due to a configuration error.

In a tweet issued around 9:30pm last night (AEST) Fastly said, "We identified a service configuration that triggered disruptions across our POPs globally and have disabled that configuration. Our global network is coming back online."

The Fastly status page showed this:

fastly log

For readers unfamiliar with a Content Distribution Network, this is a service that will take copies of a web site and distribute them to servers scattered around the world.

For instance, you may be reading iTWire.com in London or perhaps Anchorage in Alaska. If we wanted to improve the service you receive, we would engage a CDN to store our content and whenever you access iTWire.com, the content would be delivered to you from a local server.

This has two broad positives. Firstly for us, we don't have to serve every request from our own computing resources, instead we only respond to a small number of requests from the CDN servers. Secondly, you will get a much snappier response to your page requests - the CND servers are very big and fast and also, they're closer to you.

 

In order to gain some context within the local IT industry, we asked a number of vendors for their thoughts on the outage itself and also how organisations should protect their presence on the internet.

Lotem Finklestein, Head of Threat Intelligence at Check Point offered this, "While we don't yet know the reason for the widespread outage at cloud service company, Fastly, it's important to understand why the impact is so extensive. Fastly is a CDN - a content delivery network. CDNs generate replicas of original websites for the website owners to allow load balancing."

"When a CDN fails, it means that all the replicas are unavailable and no one is able to see the content from the original server. So it seems like Amazon, Reddit, Twitch and all these big sites have been attacked in unison, but they were not attacked. There is no outage for these companies. The only outage was at Fastly, the CDN that serves them.

Leo Lynch, Director, Asia Pacific, StorageCraft, an Arcserve Company reminds us that, "If the last year has taught us anything, it's that we never know what's around the corner. The latest Fastly mass internet outage, which caused many Australians to see the "HTTP Error 503" on Tuesday night when accessing their favourite websites, is only one many severe disruptions that have plagued businesses in the past year.

While Mercer reports that only about half of businesses have a business continuity plan, it's often this type of thorough, proactive planning that helps companies successfully tackle the biggest challenges that come along."

"According to what we've learned, today's outage across several news outlets was a result of a misconfiguration," said Andy Champagne, Akamai Technologies' SVP and chief technology officer of Akamai Labs. "This means that there could have been an error in a file or something as simple as a typo made by someone managing the system.

"It is also our understanding that people were getting [503] errors returned very quickly, which is an indicator of a service being unavailable, versus a cyberattack. In an attack it usually takes some time for the consumer to see an error.

"What people experienced today is just another reminder of how the internet is a lifeline for consumers and for businesses, and we have come to count on it being reliable and available to us when we need it.

Marcus Thompson, AM, PhD, retired Army officer and former Head of Information Warfare for the Australian Defence Force wanted to localise the impact of this outage, noting that, "The Fastly outage demonstrates, yet again, the importance of digital sovereignty in Australia. This was a technical outage, rather than a cyber-attack, yet the effect on Australian businesses and people was the same. It calls into question our dependence on foreign service providers.

"We need to look closer to home for how we connect to the digital world around us. Australia has some of the greatest data and security skills in the world - the cost of not utilising that in terms of security and economic value is staggering.

"The Government's Security of Critical Infrastructure (SOCI) legislation couldn't be more timely and important to drive this - and bring our data - home."

In a similar vein, Adam Cassar, Co-founder of Peakhour.io says, "A global issue shows some shared component failed resulting in Fastly not being able to process requests, likely effecting their ability to connect to client Origin servers. Fastly tweeted that it was a 'configuration issue' (as we noted above). After the issue was resolved Fastly say clients may experience a 'lower CHR'.

"What does this imply? Varnish achieves its performance through in-memory caching. A lower CHR would mean that Varnish was restarted, losing that cache. A configuration error means that a configuration change was enacted that resulted in a global outage. We can surmise from this that, there are shared components in the Fastly caching network and that the configuration change was enacted globally without sufficient testing.

Finally, according to Associate Professor Carsten Rudolph, Department of Software Systems & Cybersecurity, Faculty of Information Technology, Monash University "During last night's outage, which impacted websites like The Age, Sydney Morning Herald, New York Times, Amazon and Gov.uk, Fastly claimed that the 'network has built-in redundancies and automatic failover routing to ensure optimal performance and uptime'. While automatic failover is not easy, if there is a major issue, the remaining nodes might receive a very high load and either become very slow or completely fail.

"Today we learnt that the outage was due to a misconfiguration of Fastly's 'points of presence' (POPs). These are servers distributed all over the world and once the issue was identified, it was relatively easy to fix.

"Moving from centralised solutions to distributed architectures that use a world-wide network of POPs can improve speed of delivery and potentially its reliability. However, the example of the Fastly outage shows that small errors can not only disrupt centralised services, but also these distributed solutions.

"These types of reliability issues can potentially result in financial losses and point to the need for a proper risk analysis. Businesses need to understand exactly what services and infrastructures they rely on. Even if these services promise high stability and redundancy, it is always possible that one or even several could fail and businesses need to plan for these outages and have contingency actions in place, if the risk becomes too high."

As usual, the messages are plain for all to see. For providers, make sure you test all changes before they're rolled out, and for customers, make sure you have multiple ways to deliver your internet content.

Simple, really!


Subscribe to ITWIRE UPDATE Newsletter here

GRAND OPENING OF THE ITWIRE SHOP

The much awaited iTWire Shop is now open to our readers.

Visit the iTWire Shop, a leading destination for stylish accessories, gear & gadgets, lifestyle products and everyday portable office essentials, drones, zoom lenses for smartphones, software and online training.

PLUS Big Brands include: Apple, Lenovo, LG, Samsung, Sennheiser and many more.

Products available for any country.

We hope you enjoy and find value in the much anticipated iTWire Shop.

ENTER THE SHOP NOW!

INTRODUCING ITWIRE TV

iTWire TV offers a unique value to the Tech Sector by providing a range of video interviews, news, views and reviews, and also provides the opportunity for vendors to promote your company and your marketing messages.

We work with you to develop the message and conduct the interview or product review in a safe and collaborative way. Unlike other Tech YouTube channels, we create a story around your message and post that on the homepage of ITWire, linking to your message.

In addition, your interview post message can be displayed in up to 7 different post displays on our the iTWire.com site to drive traffic and readers to your video content and downloads. This can be a significant Lead Generation opportunity for your business.

We also provide 3 videos in one recording/sitting if you require so that you have a series of videos to promote to your customers. Your sales team can add your emails to sales collateral and to the footer of their sales and marketing emails.

See the latest in Tech News, Views, Interviews, Reviews, Product Promos and Events. Plus funny videos from our readers and customers.

SEE WHAT'S ON ITWIRE TV NOW!

BACK TO HOME PAGE
David Heath

David Heath has had a long and varied career in the IT industry having worked as a Pre-sales Network Engineer (remember Novell NetWare?), General Manager of IT&T for the TV Shopping Network, as a Technical manager in the Biometrics industry, and as a Technical Trainer and Instructional Designer in the industrial control sector. In all aspects, security has been a driving focus. Throughout his career, David has sought to inform and educate people and has done that through his writings and in more formal educational environments.

Share News tips for the iTWire Journalists? Your tip will be anonymous

WEBINARS ONLINE & ON-DEMAND

GUEST ARTICLES

VENDOR NEWS

Guest Opinion

Guest Interviews

Guest Reviews

Guest Research

Guest Research & Case Studies

Channel News

Comments