
Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai: "Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.." Anybody confirm, deny, got hit by it? Cheers, -- jra -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Something def went very wrong. Completely out for ~6h with ongoing reports of some mobile cells still down, fixed line not working etc. https://ioda.inetintel.cc.gatech.edu/asn/4804?from=1699351533&until=16994811... ________________________________ From: Outages <outages-bounces@outages.org> on behalf of Jay Ashworth via Outages <outages@outages.org> Sent: 08 November 2023 15:20 To: outages@outages.org <outages@outages.org> Subject: [outages] Optus/Akamai EXTERNAL SENDER. Do not click links or open attachments unless you recognize the sender and know the content is safe. DO NOT provide your username or password. Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai: "Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.." Anybody confirm, deny, got hit by it? Cheers, -- jra -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Disclaimer: I am employed by Akamai I’m not aware of any changes that could have triggered this, if someone has information they can share you can reach out to me directly or to our NOCC. I can be reached at jared@akamai.com if you have something private to share and my mobile number is easily findable as well. - Jared
On Nov 8, 2023, at 9:20 AM, Jay Ashworth via Outages <outages@outages.org> wrote:
Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:
"Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."
Anybody confirm, deny, got hit by it?
Cheers, -- jra -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

There was definitely a major outage with Optus yesterday - not just internet services, but mobile and landline. roughly 10 hour outage for effectively their entire consumer network (weirdly, I have an enterprise service which was not effected by the outage). Definitely got hit by it - our entire corporate mobile fleet was out the whole day. Nobody has given a RFO yet - but I have my doubts it was just BGP issues. That's speculation at this point. D On Thu, 9 Nov 2023 at 01:21, Jay Ashworth via Outages <outages@outages.org> wrote:
Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:
"Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."
Anybody confirm, deny, got hit by it?
Cheers, -- jra -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- veg·e·tar·i·an: Ancient tribal slang for the village idiot who can't hunt, fish or ride

That makes no sense. How would tripping the max prefix on a single peer cause a major outage? On 11/8/2023 3:13 PM, DaZZa via Outages wrote:
There was definitely a major outage with Optus yesterday - not just internet services, but mobile and landline. roughly 10 hour outage for effectively their entire consumer network (weirdly, I have an enterprise service which was not effected by the outage). Definitely got hit by it - our entire corporate mobile fleet was out the whole day.
Nobody has given a RFO yet - but I have my doubts it was just BGP issues. That's speculation at this point.
D
On Thu, 9 Nov 2023 at 01:21, Jay Ashworth via Outages <outages@outages.org> wrote:
Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:
"Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."
Anybody confirm, deny, got hit by it?
Cheers, -- jra -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- ================================================================ Aaron Wendel Chief Technical Officer Wholesale Internet, Inc. (AS 32097) (816)550-9030 http://www.wholesaleinternet.com ================================================================

Hi, On Wed, Nov 08, 2023 at 03:25:13PM -0600, Aaron Wendel via Outages wrote:
That makes no sense. How would tripping the max prefix on a single peer cause a major outage?
If you have client --> border router -> route reflector -> all other BGP speakers and the "RR -> BGP speakers" sessions get tripped due to "client sending in too many new routes", then your whole network will fall apart until you can shutdown that initial BGP session (or re-provision the other sessions, which might not work due to "there is no connectivity to the management systems, because, BGP is down"). *Iff* this happens, and you do not have working OOB access including being able to do local config changes on the routers ("all configs are done by the automatization, no local access possible"), such a problem will be extremely messy to recover. Especially figuring out *what* happened, if you have no visibility because the routers have lost the route to your syslog servers.... gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany gert@greenie.muc.de

Are people using max-prefix for iBGP sessions? That seems.....unwise. -Steve On Thu, Nov 9, 2023 at 1:24 AM Gert Doering via Outages <outages@outages.org> wrote:
Hi,
On Wed, Nov 08, 2023 at 03:25:13PM -0600, Aaron Wendel via Outages wrote:
That makes no sense. How would tripping the max prefix on a single peer cause a major outage?
If you have
client --> border router -> route reflector -> all other BGP speakers
and the "RR -> BGP speakers" sessions get tripped due to "client sending in too many new routes", then your whole network will fall apart until you can shutdown that initial BGP session (or re-provision the other sessions, which might not work due to "there is no connectivity to the management systems, because, BGP is down").
*Iff* this happens, and you do not have working OOB access including being able to do local config changes on the routers ("all configs are done by the automatization, no local access possible"), such a problem will be extremely messy to recover. Especially figuring out *what* happened, if you have no visibility because the routers have lost the route to your syslog servers....
gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany gert@greenie.muc.de _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- -Steve

On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote:
Are people using max-prefix for iBGP sessions?
That seems.....unwise.
Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions. Kind regards, Job

On Thu, Nov 9, 2023 at 7:22 PM Job Snijders via Outages <outages@outages.org> wrote:
On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote:
Are people using max-prefix for iBGP sessions?
That seems.....unwise.
Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.
TCAM limits, perhaps ? Rubens

Surely it's better to drop some routes than to drop the whole session. On Thu, Nov 9, 2023, 5:26 PM Rubens Kuhl via Outages <outages@outages.org> wrote:
On Thu, Nov 9, 2023 at 7:22 PM Job Snijders via Outages <outages@outages.org> wrote:
On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <
outages@outages.org> wrote:
Are people using max-prefix for iBGP sessions?
That seems.....unwise.
Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.
TCAM limits, perhaps ?
Rubens _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

----- On Nov 9, 2023, at 2:31 PM, Ross Tajvar via Outages outages@outages.org wrote: Hi,
Surely it's better to drop some routes than to drop the whole session.
Not necessarily. If part of your IBGP feed includes 10/8 (common in datacenters), you may end up in a situation where it's better to disable a leaking peer. Imagine a Trident 3 box in a spine layer, with 32 northbound peers and 32 southbound peers. Northbound you'll receive a few hundred routes, including 10/8. Southbound, you'll receive a few hundred host subnets and perhaps some any-cast /32s. You'll aggregate the host subnets northbound. If one of those northbound peers leaks a full table, you run the risk of losing some of your host subnets, or perhaps important any-cast routes. I would prefer to lose the offending peer over risking host routes, primarily because shutting down the peer is deterministic while you have no control over which routes are lost. Thanks, Sabri

In my experience with such things, it's not a matter of "some routes will be dropped", but "traffic to certain destinations will blackhole unless there's a covering route in FIB." On Thu, 9 Nov 2023, Ross Tajvar via Outages wrote:
Surely it's better to drop some routes than to drop the whole session.
On Thu, Nov 9, 2023, 5:26 PM Rubens Kuhl via Outages <outages@outages.org> wrote: On Thu, Nov 9, 2023 at 7:22 PM Job Snijders via Outages <outages@outages.org> wrote: > > On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote: >> >> Are people using max-prefix for iBGP sessions? >> >> That seems.....unwise. > > > > Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.
TCAM limits, perhaps ?
Rubens _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

Those are actually one and the same if you think about it, but yes, you are correct… To make matters worse, it’s not particularly deterministic which routes are dropped. However, I’ll make the argument that losing some random assortment of destinations is almost always going to be better than losing your entire ability to control your peering routers. YMMV. Owen
On Nov 14, 2023, at 11:37, Jon Lewis via Outages <outages@outages.org> wrote:
In my experience with such things, it's not a matter of "some routes will be dropped", but "traffic to certain destinations will blackhole unless there's a covering route in FIB."
On Thu, 9 Nov 2023, Ross Tajvar via Outages wrote:
Surely it's better to drop some routes than to drop the whole session. On Thu, Nov 9, 2023, 5:26 PM Rubens Kuhl via Outages <outages@outages.org> wrote: On Thu, Nov 9, 2023 at 7:22 PM Job Snijders via Outages <outages@outages.org> wrote: > > On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote: >> >> Are people using max-prefix for iBGP sessions? >> >> That seems.....unwise. > > > > Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.
TCAM limits, perhaps ?
Rubens _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key________________________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
participants (13)
-
Aaron Wendel
-
DaZZa
-
Gert Doering
-
Jared Mauch
-
Jay Ashworth
-
Job Snijders
-
Jon Lewis
-
Kris Saw
-
Owen DeLong
-
Ross Tajvar
-
Rubens Kuhl
-
Sabri Berisha
-
Steve Meuse