[Outages-discussion] Re: [Outages] AWS US-EAST-1

20 Oct 2025

      I think we're both saying the same thing in a different way.

Bad design is bad design is bad design. I think the metric is an acceptable
level of failure.

Remember a few years ago it was some random data center in San Antonio or
something like that resulting in a multi-day outage for Microsoft because
some core service lived/routed through that.

The only guy we should be laughing at is the one who thinks he can design
away all of these issues next year when his datacenter gets struck by
lightning.

On Mon, Oct 20, 2025 at 5:10 PM Aaron C. de Bruyn <aaron@heyaaron.com>
wrote:
...
On Mon, Oct 20, 2025 at 1:39 PM Shaun Potts via Outages-discussion <
outages-discussion@outages.org> wrote:
...
I always enjoy the armchair "haha that's why you don't use <x>" engineers.
I always enjoy it when the next generation of engineers with fresh and
exciting new ideas are forced to re-learn what "single point of failure"
means.
This is usually followed a few years later by realizing that SPOF includes
companies (like AWS today), the various definitions of layer 8 on the OSI
stack, and that one time I fired up 'cssh' with the wrong target and
happily restarted a service for all customers instead of a much smaller
subset.
-A