Facebook's blog and its media statements site both have nothing to say about the outage. The behemoth's Twitter accounts has two tweets, one about two hours ago, and the other six hours ago.
Essentially, nothing seems to have happened at the company's headquarters despite reports of a border gateway protocol snafu appearing on every site belonging to world+dog and an outage that lasted more than six hours.
Stuck in the past... the Facebook Twitter account at the time of writing.
But Cloudflare's Tom Strickx and Celso Martinho issued a blog post at 21.28 UTC (10.28am AEDT), explaining the ins and outs of how Facebook and its fellow travellers Instagram, WhatsApp and Oculus Web went into a black hole, and how they were slowly returning to the Web.
|
But they said they quickly realised that something much bigger was going on when "social media quickly burst into flames, reporting what our engineers rapidly confirmed too. Facebook and its affiliated services WhatsApp and Instagram were, in fact, all down.
Nothing to see here... the Facebook media statement page where the last post was on 9 September.
"Their DNS names stopped resolving, and their infrastructure IPs were unreachable. It was as if someone had 'pulled the cables' from their data centres all at once and disconnected them from the Internet."
Strickx and Martinho said at 1658 UTC, they noticed that Facebook was no longer announcing its routes to its DNS prefixes. "That meant that, at least, Facebook’s DNS servers were unavailable. Because of this Cloudflare’s 1.1.1.1 DNS resolver could no longer respond to queries asking for the IP address of facebook.com or instagram.com," they wrote.
I can reach Facebook, WhatsApp, Instagram and Oculus Quest successfully in the UK now. ?
— Kevin Beaumont (@GossiTheDog) October 4, 2021
It will take time across ISPs and countries, and I imagine some turbulence as devices reconnect etc.
On checking their database of BGP updates, the duo found a number of routing changes made by Facebook at about 15.40 UTC.
"Routes were withdrawn, Facebook’s DNS servers went offline, and one minute after the problem occurred, Cloudflare engineers were in a room wondering why 1.1.1.1 couldn’t resolve facebook.com and worrying that it was somehow a fault with our systems," they said.
The Facebook media blog... again no information about the outage.
Due to this, resolvers across the globe stopped resolving their domain names, they said, providing a detailed explanation of how the DNS system works.
When Facebook stopped announcing its DNS prefix routes through BGP, Cloudflare's and everyone else's DNS resolvers had no way to connect to their nameservers.
"Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses," Strickx and Martinho said.
"But that's not all. Now human behaviour and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows."
They said at about 21.20 UTC, the availability of the DNS name facebook.com on Cloudflare's DNS resolver 1.1.1.1 returned.
"Undoubtedly Facebook, WhatsApp and Instagram services will take further time to come online but as of 21:28 UTC Facebook appears to be reconnected to the global Internet and DNS working again," the pair said.
"Today's events are a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together. That trust, standardisation, and co-operation between entities are at the centre of making it work for almost five billion active users worldwide," they concluded.