Wednesday, 03 July 2019 05:08

Botched firewall update causes Cloudflare outage Featured

Botched firewall update causes Cloudflare outage Pixabay

A botched firewall ruleset deployment took down the content delivery network Cloudflare on Tuesday, generating 502 errors. It was caused by a huge spike in CPU utilisation on the company's network.

Cloudflare chief technology officer John Graham-Cumming said in a statement that the CPU spike was caused by the error in software deployment which was rolled back after the problem was noticed.

"Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels," he said.

Graham-Cumming said the outage had been noticed at 1342 UTC (11.42pm AEST 2 July) right across the entire network with all sites that used Cloudflare as a proxy showing a 502 (bad gateway) error.

He said this had been caused by a single misconfigured rule in the company's Web Application Firewall during a routine deployment of new rules.

"The intent of these new rules was to improve the blocking of inline JavaScript that is used in attacks," he said.

"These rules were being deployed in a simulated mode where issues are identified and logged by the new rule, but no customer traffic is actually blocked so that we can measure false positive rates and ensure that the new rules do not cause problems when they are deployed into full production."

But one rule had a regular expression that caused CPU usage to spike to 100% on all Cloudflare machines, with traffic dropping by 82% at the peak of the problem.

Graham-Cumming said the techies had understood what was happening by 1402 UTC (0.02am AEST 3 July) and stopped all the rulesets seven minutes later.

"We then went on to review the offending pull request, roll back the specific rules, test the change to ensure that we were 100% certain that we had the correct fix, and re-enabled the WAF Managed Rulesets at 1452 UTC (0.52am 3 July AEST)," he said.

On 24 June, Cloudflare was affected by a large-scale border gateway protocol leak cause by US telco Verizon, with the routes to many big websites instead transiting through DQE Communications, a small company in Pennsylvania.

At the time, Cloudflare's Tom Strickx said in a blog post that the problem had been magnified by the involvement of a so-called BGP optimiser product from a company known as Noction. The problems began at about 10:30 UTC (8.30pm 24 June AEST) and were sorted two hours later.

Verizon has not yet issued any explanation as to how it caused this issue.

Disclosure: iTWire uses Cloudflare's services.


You cannot afford to miss this Dell Webinar.

With Windows 7 support ending 14th January 2020, its time to start looking at your options.

This can have significant impacts on your organisation but also presents organisations with an opportunity to fundamentally rethink the way users work.

The Details

When: Thursday, September 26, 2019
Presenter: Dell Technologies
Location: Your Computer


QLD, VIC, NSW, ACT & TAS: 11:00 am
SA, NT: 10:30 am
WA: 9:00 am NZ: 1:00 pm

Register and find out all the details you need to know below.



iTWire can help you promote your company, services, and products.


Advertise on the iTWire News Site / Website

Advertise in the iTWire UPDATE / Newsletter

Promote your message via iTWire Sponsored Content/News

Guest Opinion for Home Page exposure

Contact Andrew on 0412 390 000 or email [email protected]


Sam Varghese

website statistics

Sam Varghese has been writing for iTWire since 2006, a year after the site came into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.



Recent Comments