HE FMT2 down, various network foo eminating

Hurricane Electric has a datacenter outage at FMT2, cause as yet unspecified, but there are SF bay area and possibly wider network issues eminating from it at low to moderate intensity. Out around 12:15am PST Sat morning. -- -george william herbert george.herbert@gmail.com

Outage started at 11:37pm Pacific, give or take a minute. I spoke to them at about 11:43pm and they were aware there was an issue, but didn't know the cause. Since then it's been impossible to get through to anything but voicemail :( Scott On Sat, Sep 26, 2009 at 1:09 AM, George Herbert <george.herbert@gmail.com>wrote:
Hurricane Electric has a datacenter outage at FMT2, cause as yet unspecified, but there are SF bay area and possibly wider network issues eminating from it at low to moderate intensity.
Out around 12:15am PST Sat morning.

On Sat, Sep 26, 2009 at 1:13 AM, Scott Howard <scott@doc.net.au> wrote:
Outage started at 11:37pm Pacific, give or take a minute.
I spoke to them at about 11:43pm and they were aware there was an issue, but didn't know the cause. Since then it's been impossible to get through to anything but voicemail :(
One of my coworkers got through on the phone to their NOC after several tries, about 20 min ago now I think, but they didn't give more specifics as to cause. I think they're just very busy, not that their phones went out too. -- -george william herbert george.herbert@gmail.com

It's back up as of a few seconds ago. Scott. On Sat, Sep 26, 2009 at 1:13 AM, Scott Howard <scott@doc.net.au> wrote:
Outage started at 11:37pm Pacific, give or take a minute.
I spoke to them at about 11:43pm and they were aware there was an issue, but didn't know the cause. Since then it's been impossible to get through to anything but voicemail :(
Scott
On Sat, Sep 26, 2009 at 1:09 AM, George Herbert <george.herbert@gmail.com>wrote:
Hurricane Electric has a datacenter outage at FMT2, cause as yet unspecified, but there are SF bay area and possibly wider network issues eminating from it at low to moderate intensity.
Out around 12:15am PST Sat morning.

This should probably go to outages-discuss, but I have to say something. These kinds of outages were regular occurrences at HE Fremont (FMT2 had not been built yet, but I doubt it matters). Here's a few choice examples -- and these are just a few -- of recurring things which never got addressed. I will never forget how packets with a Comcast source, HE Fremont destination, went through an AT&T "device" somewhere in the mix which would regularly burp/flake out and HE couldn't do anything about it. No alternate routes were propagated, so literally Comcast->HE was hard down. Single point of failure. I will never forget how packets with a SBC DSL (now AT&T) source, HE Fremont destination, went through Telia (a Swedish ISP with no North American NOC) for a single hop, and that Telia would drop their BGP session or severe their link for whatever reason. Like the above, no alternate routes were propagated, so literally SBC->HE was hard down. Likewise, individuals in Sweden who used Telia also saw the same thing going from Telia Sweden->HE Fremont. Single point of failure. I will never forget the Cisco router which would reboot every 4-5 months for no reason... a problem which went on for literally years, and all I was ever told was "well it's back up now" and "we have a case open with Cisco". Single point of failure. I will never forget how HE refused to use VLANs to segregate customers on a layer 2 level, instead preferring some strange layer 3 implementation. When we witnessed an unexpected massive (7-8mbit/sec) increase in inbound traffic, only to find that the destination IPs of these packets were for another customer in a completely different netblock/area of the Fremont facility, we were told by support "that's impossible". Full tcpdump captures were given, and we were told "this makes no sense, this can't happen". 4-5 hours later, we were told the root cause was "a customer who had misconfigured their load balancer". Right. I will never forget the two separate times there were full-scale power outages both caused by "UPS maintenance". Gas generators? They have them, but when I asked why they didn't kick in, I was told "we don't know". When I asked if there would be a follow-up investigation as to why that didn't happen, so the issue wouldn't recur, I was told "probably". I will never forget the "maintenances" that we were never told of, because scheduled maintenances are not announced to customers. I was left with the impression they're done on a whim vs. scheduled. Safe to say when our contract ended, we left. I will not recommend Hurricane Electric to anyone who wants co-location with reliable, redundant connectivity. I *will* recommend them for cheap "I just need a 1U box stuck somewhere and don't really care about quality" co-location, although compared to some of their competitors, they're actually more expensive. Also, if HE or HE customers read this and want to flame/argue/do burn-outs in a Pacer over this -- don't bother. I won't be responding to any mails. Why? Because all outages/incidents witnessed, including the above, were sent to our account rep. when we were asked "why aren't you renewing?" I received no response past that point. I'll leave you with this: if FMT2 had redundancy, then how were things hard down for over an hour? Think about it. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | On Sat, Sep 26, 2009 at 01:17:22AM -0700, Scott Howard wrote:
It's back up as of a few seconds ago.
Scott.
On Sat, Sep 26, 2009 at 1:13 AM, Scott Howard <scott@doc.net.au> wrote:
Outage started at 11:37pm Pacific, give or take a minute.
I spoke to them at about 11:43pm and they were aware there was an issue, but didn't know the cause. Since then it's been impossible to get through to anything but voicemail :(
Scott
On Sat, Sep 26, 2009 at 1:09 AM, George Herbert <george.herbert@gmail.com>wrote:
Hurricane Electric has a datacenter outage at FMT2, cause as yet unspecified, but there are SF bay area and possibly wider network issues eminating from it at low to moderate intensity.
Out around 12:15am PST Sat morning.
_______________________________________________ outages mailing list outages@outages.org https://puck.nether.net/mailman/listinfo/outages

On Sat, 26 Sep 2009, Jeremy Chadwick wrote:
I will never forget how HE refused to use VLANs to segregate customers on a layer 2 level, instead preferring some strange layer 3 implementation. When we witnessed an unexpected massive (7-8mbit/sec) increase in inbound traffic, only to find that the destination IPs of these packets were for another customer in a completely different netblock/area of the Fremont facility, we were told by support "that's impossible". Full tcpdump captures were given, and we were told "this makes no sense, this can't happen". 4-5 hours later, we were told the root cause was "a customer who had misconfigured their load balancer".
This will happen if using Cisco's private VLANs (PVLANs) in an isolated or community mode on the switch. Because of the way most load balancers work in the layer 2 environment, the traffic ends up being broadcast to all switch ports in the group. My guess is they had customers bridged over to the next-hop using PVLANs, so that customers were theoretically isolated at a layer 2 level. Of course, PVLANs don't provide any true isolation. This is just speculation about their setup based on what you describe, but I've seen this in practice with PVLANs, and it seems it to be plausible. -- William R. Lorenz
participants (4)
-
George Herbert
-
Jeremy Chadwick
-
Scott Howard
-
William R. Lorenz