IPv6 tunnels in FRA1 on HE.net down?

Constantine A. Murenin

23 Apr 2013 23 Apr '13

6:28 p.m.

As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Show replies by date

Constantine A. Murenin

23 Apr 23 Apr

7:28 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-) According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted. Which adds up as per my mtr from a Linode: # mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013 ... 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013 And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage). Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013 However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/ Cheers, Constantine.

Constantine A. Murenin

14 May 14 May

9:42 a.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

This just happened again: all IPv6 tunnels on tserv1.fra1.he.net. were inaccessible; from within a tunnel, cannot access any IPv6 resource, other than ordns.he.net, which runs on the tunnel server itself. Why does FRA1 loses IPv6 connectivity so often? Update: Seems like it has been resolved as I've been writing this email, but this would seem to happen a few times too many. C. On 23 April 2013 19:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-)

According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted.

Which adds up as per my mtr from a Linode:

# mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013

...

2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013

And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage).

Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013

However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/

Cheers, Constantine.

Constantine A. Murenin

3:50 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

For what it is worth, further details about the issue have surfaced. I found a friend who also has a tunnel on tserv1.fra1.he.net., and he has been running smokeping to various IPv4 and IPv6 resources for quite a while. According to several of his smokeping reports, it can be concluded that this very outage occurred during 14T18:00/05 and 14T18:45/50; but we've also noticed that there was another, 6 hour (yes, 6 hour) outage a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to late morning Pacific Time). I've contacted he.net again this time around, and they said that they're trying to hunt some obscure kernel bug that is causing these issues. The tunnelbroker.net is a free service, but to have a 6 hour outage, clearly spanning 1/4th of a whole day, is absolutely ridiculous. I'm stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently, still not monitored, even though it's known to be having these issues. ??? Alternatively, it is, of course, possible that some engineer has been troubleshooting the root cause of this issue for those whole 5 or 6 hours on Sunday/Monday night; but I find that somewhat hard to believe; more like it got busted, and noone responsible knew about it being busted for most of the time that it was. Even more troubling, is that they don't even publish any reports about these extended outages. For tserv1.fra1.he.net. end users: if you can `ping6 ordns.he.net` (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most likely means that tserv1.fra1.he.net IPv6-connectivity is down again, and you must open a ticket with HE.net ASAP. Perhaps someone should setup a smokeping with automatic emails to support@he.net? C. On 14 May 2013 09:42, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

This just happened again: all IPv6 tunnels on tserv1.fra1.he.net. were inaccessible; from within a tunnel, cannot access any IPv6 resource, other than ordns.he.net, which runs on the tunnel server itself.

Why does FRA1 loses IPv6 connectivity so often?

Update: Seems like it has been resolved as I've been writing this email, but this would seem to happen a few times too many.

C.

On 23 April 2013 19:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-)

According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted.

Which adds up as per my mtr from a Linode:

# mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013

...

2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013

And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage).

Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013

However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/

Cheers, Constantine.

Constantine A. Murenin

12 Jun 12 Jun

3:11 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT), this time it seems like the whole thing is down, cannot even ping ordns.he.net over IPv6; a connection of my friend who's running a smokeping is also down, e.g. this is definitely widespread. Not sure what's exactly down, but it seems to be bgp-related, perhaps: 1 2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1) 0.699 ms 0.828 ms 0.970 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1) 5.936 ms 5.877 ms 5.859 ms 3 10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2) 6.743 ms 6.877 ms 6.978 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * C. On 14 May 2013 15:50, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

For what it is worth, further details about the issue have surfaced. I found a friend who also has a tunnel on tserv1.fra1.he.net., and he has been running smokeping to various IPv4 and IPv6 resources for quite a while.

According to several of his smokeping reports, it can be concluded that this very outage occurred during 14T18:00/05 and 14T18:45/50; but we've also noticed that there was another, 6 hour (yes, 6 hour) outage a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to late morning Pacific Time).

I've contacted he.net again this time around, and they said that they're trying to hunt some obscure kernel bug that is causing these issues.

The tunnelbroker.net is a free service, but to have a 6 hour outage, clearly spanning 1/4th of a whole day, is absolutely ridiculous. I'm stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently, still not monitored, even though it's known to be having these issues. ???

Alternatively, it is, of course, possible that some engineer has been troubleshooting the root cause of this issue for those whole 5 or 6 hours on Sunday/Monday night; but I find that somewhat hard to believe; more like it got busted, and noone responsible knew about it being busted for most of the time that it was.

Even more troubling, is that they don't even publish any reports about these extended outages.

For tserv1.fra1.he.net. end users: if you can `ping6 ordns.he.net` (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most likely means that tserv1.fra1.he.net IPv6-connectivity is down again, and you must open a ticket with HE.net ASAP. Perhaps someone should setup a smokeping with automatic emails to support@he.net?

C.

On 14 May 2013 09:42, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
This just happened again: all IPv6 tunnels on tserv1.fra1.he.net. were inaccessible; from within a tunnel, cannot access any IPv6 resource, other than ordns.he.net, which runs on the tunnel server itself.

Why does FRA1 loses IPv6 connectivity so often?

Update: Seems like it has been resolved as I've been writing this email, but this would seem to happen a few times too many.

C.

On 23 April 2013 19:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-)

According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted.

Which adds up as per my mtr from a Linode:

# mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013

...

2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013

And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage).

Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013

However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/

Cheers, Constantine.

Constantine A. Murenin

3:19 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

I smell a big outage: % date; traceroute tserv1.fra1.he.net Wed Jun 12 15:17:04 PDT 2013 traceroute to tserv1.fra1.he.net (216.66.80.30), 30 hops max, 60 byte packets 1 192.168.105.3 (192.168.105.3) 0.673 ms 0.773 ms 0.923 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (65.49.10.217) 1.795 ms 1.811 ms 1.795 ms 3 10gigabitethernet12-1.core1.lax1.he.net (184.105.213.26) 19.773 ms 10gigabitethernet10-1.core1.sjc2.he.net (184.105.222.14) 0.786 ms 0.756 ms 4 10gigabitethernet10-8.core1.nyc4.he.net (72.52.92.225) 71.714 ms 76.569 ms 10gigabitethernet14-2.core1.nyc4.he.net (184.105.213.198) 71.690 ms 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * *^C However, a "reverse" traceroute works fine: % date; traceroute ns1.he.net Wed Jun 12 15:18:47 PDT 2013 traceroute to ns1.he.net (216.218.130.2), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 0.682 ms 0.531 ms 0.481 ms 2 hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) 0.245 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.242 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 0.241 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.951 ms hos-bb1.juniper1.ffm.hetzner.de (213.239.240.224) 4.803 ms 4.786 ms 4 30gigabitethernet4-3.core1.fra1.he.net (80.81.192.172) 6.354 ms 6.81 ms 5.451 ms 5 10gigabitethernet10-2.core1.par2.he.net (72.52.92.26) 22.803 ms 24.531 ms 24.787 ms 6 10gigabitethernet15-1.core1.ash1.he.net (184.105.213.93) 101.504 ms 99.563 ms 99.958 ms 7 10gigabitethernet11-1.core1.pao1.he.net (184.105.213.177) 163.687 ms 171.711 ms 175.34 ms 8 10gigabitethernet1-2.core1.fmt1.he.net (184.105.213.65) 163.940 ms 171.362 ms 167.67 ms 9 ns1.he.net (216.218.130.2) 163.52 ms 165.265 ms 164.143 ms C. On 12 June 2013 15:11, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT), this time it seems like the whole thing is down, cannot even ping ordns.he.net over IPv6; a connection of my friend who's running a smokeping is also down, e.g. this is definitely widespread.

Not sure what's exactly down, but it seems to be bgp-related, perhaps:

1 2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1) 0.699 ms 0.828 ms 0.970 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1) 5.936 ms 5.877 ms 5.859 ms 3 10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2) 6.743 ms 6.877 ms 6.978 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * *

C.

On 14 May 2013 15:50, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
For what it is worth, further details about the issue have surfaced. I found a friend who also has a tunnel on tserv1.fra1.he.net., and he has been running smokeping to various IPv4 and IPv6 resources for quite a while.

According to several of his smokeping reports, it can be concluded that this very outage occurred during 14T18:00/05 and 14T18:45/50; but we've also noticed that there was another, 6 hour (yes, 6 hour) outage a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to late morning Pacific Time).

I've contacted he.net again this time around, and they said that they're trying to hunt some obscure kernel bug that is causing these issues.

The tunnelbroker.net is a free service, but to have a 6 hour outage, clearly spanning 1/4th of a whole day, is absolutely ridiculous. I'm stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently, still not monitored, even though it's known to be having these issues. ???

Alternatively, it is, of course, possible that some engineer has been troubleshooting the root cause of this issue for those whole 5 or 6 hours on Sunday/Monday night; but I find that somewhat hard to believe; more like it got busted, and noone responsible knew about it being busted for most of the time that it was.

Even more troubling, is that they don't even publish any reports about these extended outages.

For tserv1.fra1.he.net. end users: if you can `ping6 ordns.he.net` (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most likely means that tserv1.fra1.he.net IPv6-connectivity is down again, and you must open a ticket with HE.net ASAP. Perhaps someone should setup a smokeping with automatic emails to support@he.net?

C.

On 14 May 2013 09:42, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
This just happened again: all IPv6 tunnels on tserv1.fra1.he.net. were inaccessible; from within a tunnel, cannot access any IPv6 resource, other than ordns.he.net, which runs on the tunnel server itself.

Why does FRA1 loses IPv6 connectivity so often?

Update: Seems like it has been resolved as I've been writing this email, but this would seem to happen a few times too many.

C.

On 23 April 2013 19:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-)

According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted.

Which adds up as per my mtr from a Linode:

# mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013

...

2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013

And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage).

Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013

However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/

Cheers, Constantine.

-- В. В. Путин о совершенстве, 24 декабря 2000 года: Если человека все устраивает, то он полный идиот. Здорового человека в нормальной памяти не может всегда и всё устраивать.

Tony McCrory

3:30 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

Indeed, traceroute to my fra1 tunnel is definately getting mislaid within he.net. Time to ask for a refund? oh wait... ;) Tracing route to tmcc.me [2001:470:1f0a:3ee::2] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 6.9.1.2.2.d.e.f.f.f.f.f.6.7.a.0.3.f.7.9.d.8.4.6.0.b.8.0.1.0.0.2.ip6.arpa [2001:8b0:648d:97f3:a76:ffff:fed2:2196] 2 65 ms 65 ms 66 ms c.gormless.thn.aa.net.uk [2001:8b0:0:53::53] 3 56 ms 58 ms 61 ms 2001:7f8:4::50e8:1 4 70 ms 74 ms 74 ms 40gigabitethernet1-1.core1.lon1.he.net[2001:7f8:4::1b1b:1] 5 145 ms 141 ms 132 ms 10gigabitethernet10-4.core1.nyc4.he.net[2001:470:0:128::1] 6 146 ms 150 ms 154 ms 100gigabitethernet7-2.core1.chi1.he.net[2001:470:0:298::1] 7 211 ms 216 ms 212 ms 10gigabitethernet11-4.core1.pao1.he.net[2001:470:0:283::1] 8 * * * Request timed out. 9 * * * Request timed out. 10 * * * Request timed out. On 12 June 2013 23:19, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

I smell a big outage:

% date; traceroute tserv1.fra1.he.net Wed Jun 12 15:17:04 PDT 2013 traceroute to tserv1.fra1.he.net (216.66.80.30), 30 hops max, 60 byte packets 1 192.168.105.3 (192.168.105.3) 0.673 ms 0.773 ms 0.923 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (65.49.10.217) 1.795 ms 1.811 ms 1.795 ms 3 10gigabitethernet12-1.core1.lax1.he.net (184.105.213.26) 19.773 ms 10gigabitethernet10-1.core1.sjc2.he.net (184.105.222.14) 0.786 ms 0.756 ms 4 10gigabitethernet10-8.core1.nyc4.he.net (72.52.92.225) 71.714 ms 76.569 ms 10gigabitethernet14-2.core1.nyc4.he.net (184.105.213.198) 71.690 ms 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * *^C

However, a "reverse" traceroute works fine:

% date; traceroute ns1.he.net Wed Jun 12 15:18:47 PDT 2013 traceroute to ns1.he.net (216.218.130.2), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 0.682 ms 0.531 ms 0.481 ms 2 hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) 0.245 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.242 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 0.241 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.951 ms hos-bb1.juniper1.ffm.hetzner.de (213.239.240.224) 4.803 ms 4.786 ms 4 30gigabitethernet4-3.core1.fra1.he.net (80.81.192.172) 6.354 ms 6.81 ms 5.451 ms 5 10gigabitethernet10-2.core1.par2.he.net (72.52.92.26) 22.803 ms 24.531 ms 24.787 ms 6 10gigabitethernet15-1.core1.ash1.he.net (184.105.213.93) 101.504 ms 99.563 ms 99.958 ms 7 10gigabitethernet11-1.core1.pao1.he.net (184.105.213.177) 163.687 ms 171.711 ms 175.34 ms 8 10gigabitethernet1-2.core1.fmt1.he.net (184.105.213.65) 163.940 ms 171.362 ms 167.67 ms 9 ns1.he.net (216.218.130.2) 163.52 ms 165.265 ms 164.143 ms

C.

On 12 June 2013 15:11, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT), this time it seems like the whole thing is down, cannot even ping ordns.he.net over IPv6; a connection of my friend who's running a smokeping is also down, e.g. this is definitely widespread.

Not sure what's exactly down, but it seems to be bgp-related, perhaps:

1 2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1) 0.699 ms 0.828 ms 0.970 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1) 5.936 ms 5.877 ms 5.859 ms 3 10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2) 6.743 ms 6.877 ms 6.978 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * *

C.

On 14 May 2013 15:50, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
For what it is worth, further details about the issue have surfaced. I found a friend who also has a tunnel on tserv1.fra1.he.net., and he has been running smokeping to various IPv4 and IPv6 resources for quite a while.

According to several of his smokeping reports, it can be concluded that this very outage occurred during 14T18:00/05 and 14T18:45/50; but we've also noticed that there was another, 6 hour (yes, 6 hour) outage a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to late morning Pacific Time).

I've contacted he.net again this time around, and they said that they're trying to hunt some obscure kernel bug that is causing these issues.

The tunnelbroker.net is a free service, but to have a 6 hour outage, clearly spanning 1/4th of a whole day, is absolutely ridiculous. I'm stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently, still not monitored, even though it's known to be having these issues. ???

Alternatively, it is, of course, possible that some engineer has been troubleshooting the root cause of this issue for those whole 5 or 6 hours on Sunday/Monday night; but I find that somewhat hard to believe; more like it got busted, and noone responsible knew about it being busted for most of the time that it was.

Even more troubling, is that they don't even publish any reports about these extended outages.

For tserv1.fra1.he.net. end users: if you can `ping6 ordns.he.net` (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most likely means that tserv1.fra1.he.net IPv6-connectivity is down again, and you must open a ticket with HE.net ASAP. Perhaps someone should setup a smokeping with automatic emails to support@he.net?

C.

On 14 May 2013 09:42, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
This just happened again: all IPv6 tunnels on tserv1.fra1.he.net. were inaccessible; from within a tunnel, cannot access any IPv6 resource, other than ordns.he.net, which runs on the tunnel server itself.

Why does FRA1 loses IPv6 connectivity so often?

Update: Seems like it has been resolved as I've been writing this email, but this would seem to happen a few times too many.

C.

On 23 April 2013 19:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.

Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-)

According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted.

Which adds up as per my mtr from a Linode:

# mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013

...

2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013

And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage).

Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013

However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/

Cheers, Constantine.

-- В. В. Путин о совершенстве, 24 декабря 2000 года: Если человека все устраивает, то он полный идиот. Здорового человека в нормальной памяти не может всегда и всё устраивать.

_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Constantine A. Murenin

3:39 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

I'm not a network engineer, but it sounds like all specific routes for tserv1.fra1.he.net are expired / not announced, and the default route is taken instead. 40+ minutes so far... C. On 12 June 2013 15:30, Tony McCrory <tony.mccrory@gmail.com> wrote:

...

Indeed, traceroute to my fra1 tunnel is definately getting mislaid within he.net.

Time to ask for a refund? oh wait... ;)

Tracing route to tmcc.me [2001:470:1f0a:3ee::2] over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms 6.9.1.2.2.d.e.f.f.f.f.f.6.7.a.0.3.f.7.9.d.8.4.6.0.b.8.0.1.0.0.2.ip6.arpa [2001:8b0:648d:97f3:a76:ffff:fed2:2196] 2 65 ms 65 ms 66 ms c.gormless.thn.aa.net.uk [2001:8b0:0:53::53] 3 56 ms 58 ms 61 ms 2001:7f8:4::50e8:1 4 70 ms 74 ms 74 ms 40gigabitethernet1-1.core1.lon1.he.net [2001:7f8:4::1b1b:1] 5 145 ms 141 ms 132 ms 10gigabitethernet10-4.core1.nyc4.he.net [2001:470:0:128::1] 6 146 ms 150 ms 154 ms 100gigabitethernet7-2.core1.chi1.he.net [2001:470:0:298::1] 7 211 ms 216 ms 212 ms 10gigabitethernet11-4.core1.pao1.he.net [2001:470:0:283::1] 8 * * * Request timed out. 9 * * * Request timed out. 10 * * * Request timed out.

On 12 June 2013 23:19, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
I smell a big outage:

% date; traceroute tserv1.fra1.he.net Wed Jun 12 15:17:04 PDT 2013 traceroute to tserv1.fra1.he.net (216.66.80.30), 30 hops max, 60 byte packets 1 192.168.105.3 (192.168.105.3) 0.673 ms 0.773 ms 0.923 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (65.49.10.217) 1.795 ms 1.811 ms 1.795 ms 3 10gigabitethernet12-1.core1.lax1.he.net (184.105.213.26) 19.773 ms 10gigabitethernet10-1.core1.sjc2.he.net (184.105.222.14) 0.786 ms 0.756 ms 4 10gigabitethernet10-8.core1.nyc4.he.net (72.52.92.225) 71.714 ms 76.569 ms 10gigabitethernet14-2.core1.nyc4.he.net (184.105.213.198) 71.690 ms 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * *^C

However, a "reverse" traceroute works fine:

% date; traceroute ns1.he.net Wed Jun 12 15:18:47 PDT 2013 traceroute to ns1.he.net (216.218.130.2), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 0.682 ms 0.531 ms 0.481 ms 2 hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) 0.245 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.242 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 0.241 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.951 ms hos-bb1.juniper1.ffm.hetzner.de (213.239.240.224) 4.803 ms 4.786 ms 4 30gigabitethernet4-3.core1.fra1.he.net (80.81.192.172) 6.354 ms 6.81 ms 5.451 ms 5 10gigabitethernet10-2.core1.par2.he.net (72.52.92.26) 22.803 ms 24.531 ms 24.787 ms 6 10gigabitethernet15-1.core1.ash1.he.net (184.105.213.93) 101.504 ms 99.563 ms 99.958 ms 7 10gigabitethernet11-1.core1.pao1.he.net (184.105.213.177) 163.687 ms 171.711 ms 175.34 ms 8 10gigabitethernet1-2.core1.fmt1.he.net (184.105.213.65) 163.940 ms 171.362 ms 167.67 ms 9 ns1.he.net (216.218.130.2) 163.52 ms 165.265 ms 164.143 ms

C.

On 12 June 2013 15:11, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT), this time it seems like the whole thing is down, cannot even ping ordns.he.net over IPv6; a connection of my friend who's running a smokeping is also down, e.g. this is definitely widespread.

Not sure what's exactly down, but it seems to be bgp-related, perhaps:

1 2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1) 0.699 ms 0.828 ms 0.970 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1) 5.936 ms 5.877 ms 5.859 ms 3 10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2) 6.743 ms 6.877 ms 6.978 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * *

C.

Constantine A. Murenin

4:06 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

This also affects some dns.he.net services; for example, from where you normally were getting a FRA1 ns resolver, there's now no BGP announcement, it would seem: Cns# echo ns{2,3,4,5}.he.net | xargs -n1 traceroute -w2 -m8 -A traceroute to ns2.he.net (216.218.131.2), 8 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) [AS24940] 0.682 ms 0.810 ms 0.492 ms 2 hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) [AS24940] 0.247 ms hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) [AS24940] 0.230 ms hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 0.498 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) [AS24940] 5.933 ms 5.976 ms 5.960 ms 4 noris-gw.hetzner.de (213.239.242.250) [AS24940] 6.182 ms 6.449 ms 6.243 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) [AS12337] 19.832 ms 19.833 ms 19.825 ms 6 * * * 7 ns2.he.net (216.218.131.2) [AS6939] 19.501 ms 19.486 ms 19.506 ms traceroute to ns3.he.net (216.218.132.2), 8 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) [AS24940] 0.871 ms 0.531 ms 0.451 ms 2 hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) [AS24940] 0.242 ms hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 0.235 ms hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) [AS24940] 0.238 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) [AS24940] 5.967 ms 5.928 ms 5.919 ms 4 noris-gw.hetzner.de (213.239.242.250) [AS24940] 6.270 ms 6.123 ms 6.106 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) [AS12337] 45.225 ms 19.854 ms 19.845 ms 6 * * * 7 10gigabitethernet1-1.core1.ams1.he.net (72.52.92.82) [AS6939, AS6939] 27.214 ms 41.162 ms 27.146 ms 8 ns3.he.net (216.218.132.2) [AS6939] 27.228 ms 27.230 ms 27.89 ms traceroute to ns4.he.net (216.66.1.2), 8 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) [AS24940] 3.600 ms 0.516 ms 0.452 ms 2 hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) [AS24940] 0.245 ms hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) [AS24940] 0.240 ms hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 0.240 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) [AS24940] 5.926 ms 5.953 ms 5.944 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * traceroute to ns5.he.net (216.66.80.18), 8 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) [AS24940] 2.824 ms 0.510 ms 0.429 ms 2 hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) [AS24940] 0.246 ms hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 0.253 ms hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) [AS24940] 0.236 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) [AS24940] 5.936 ms 5.950 ms 6.677 ms 4 noris-gw.hetzner.de (213.239.242.250) [AS24940] 6.193 ms 6.125 ms 6.98 ms 5 xe0-0-0-835-rt1-ldn1.core.noris.net (213.95.14.1) [AS12337] 19.799 ms 19.831 ms 19.843 ms 6 * * * 7 10gigabitethernet1-1.core1.ams1.he.net (72.52.92.82) [AS6939, AS6939] 24.567 ms 24.569 ms 25.432 ms 8 ns5.he.net (216.66.80.18) [AS6939, AS6939] 25.545 ms 25.545 ms 25.402 ms Cns# date Wed Jun 12 15:58:20 PDT 2013 The outage has started as early as 14:52 PT (not 14:57); still down as of 16:04 PT. C.

Constantine A. Murenin

4:55 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

The /tserv1.fra1.he.net/ seems to be back up, since ~16:41 PT, for a total downtime of almost 2 hours (1:49), since ~14:52 PT. However, the routing is still a bit weird right now, going through London for a 20ms ping (the usual is about 6ms). # traceroute tserv1.fra1.he.net ; date traceroute to tserv1.fra1.he.net (216.66.80.30), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 3.234 ms 0.503 ms 0.446 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) 30.126 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 1.906 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.246 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.978 ms 5.931 ms 5.923 ms 4 noris-gw.hetzner.de (213.239.242.250) 6.405 ms 6.143 ms 6.100 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) 19.823 ms 19.827 ms 19.844 ms 6 * * * 7 10gigabitethernet4-2.core1.fra1.he.net (184.105.213.146) 22.266 ms 24.706 ms 24.957 ms 8 tserv1.fra1.he.net (216.66.80.30) 19.668 ms 22.168 ms 19.577 ms Wed Jun 12 16:45:04 PDT 2013 C.

Joe Holden

13 Jun 13 Jun

5:16 a.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

Constantine A. Murenin wrote:

...

The /tserv1.fra1.he.net/ seems to be back up, since ~16:41 PT, for a total downtime of almost 2 hours (1:49), since ~14:52 PT.

However, the routing is still a bit weird right now, going through London for a 20ms ping (the usual is about 6ms).

# traceroute tserv1.fra1.he.net ; date traceroute to tserv1.fra1.he.net (216.66.80.30), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 3.234 ms 0.503 ms 0.446 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) 30.126 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 1.906 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.246 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.978 ms 5.931 ms 5.923 ms 4 noris-gw.hetzner.de (213.239.242.250) 6.405 ms 6.143 ms 6.100 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) 19.823 ms 19.827 ms 19.844 ms 6 * * * 7 10gigabitethernet4-2.core1.fra1.he.net (184.105.213.146) 22.266 ms 24.706 ms 24.957 ms 8 tserv1.fra1.he.net (216.66.80.30) 19.668 ms 22.168 ms 19.577 ms Wed Jun 12 16:45:04 PDT 2013

C. is there really need for so much noise about a service that isn't critical to anyone anywhere?

John Starta

8:09 a.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

On Jun 13, 2013, at 5:16, Joe Holden <lists@rewt.org.uk> wrote:

...

Constantine A. Murenin wrote:

...
The /tserv1.fra1.he.net/ seems to be back up, since ~16:41 PT, for a total downtime of almost 2 hours (1:49), since ~14:52 PT. However, the routing is still a bit weird right now, going through London for a 20ms ping (the usual is about 6ms). # traceroute tserv1.fra1.he.net ; date traceroute to tserv1.fra1.he.net (216.66.80.30), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 3.234 ms 0.503 ms 0.446 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) 30.126 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 1.906 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.246 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.978 ms 5.931 ms 5.923 ms 4 noris-gw.hetzner.de (213.239.242.250) 6.405 ms 6.143 ms 6.100 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) 19.823 ms 19.827 ms 19.844 ms 6 * * * 7 10gigabitethernet4-2.core1.fra1.he.net (184.105.213.146) 22.266 ms 24.706 ms 24.957 ms 8 tserv1.fra1.he.net (216.66.80.30) 19.668 ms 22.168 ms 19.577 ms Wed Jun 12 16:45:04 PDT 2013 C. is there really need for so much noise about a service that isn't critical to anyone anywhere?

An outage is an outage. If you consider this particular outage noise then I suggest you learn how to use the filter functionality of your mail client. .,

Grant Ridder

8:46 a.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

One mans garbage is another mans treasure Sent from my iPhone On Jun 13, 2013, at 8:09 AM, John Starta <john@starta.org> wrote:

...

On Jun 13, 2013, at 5:16, Joe Holden <lists@rewt.org.uk> wrote:

...
Constantine A. Murenin wrote:

...
The /tserv1.fra1.he.net/ seems to be back up, since ~16:41 PT, for a total downtime of almost 2 hours (1:49), since ~14:52 PT. However, the routing is still a bit weird right now, going through London for a 20ms ping (the usual is about 6ms). # traceroute tserv1.fra1.he.net ; date traceroute to tserv1.fra1.he.net (216.66.80.30), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 3.234 ms 0.503 ms 0.446 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) 30.126 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 1.906 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.246 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.978 ms 5.931 ms 5.923 ms 4 noris-gw.hetzner.de (213.239.242.250) 6.405 ms 6.143 ms 6.100 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) 19.823 ms 19.827 ms 19.844 ms 6 * * * 7 10gigabitethernet4-2.core1.fra1.he.net (184.105.213.146) 22.266 ms 24.706 ms 24.957 ms 8 tserv1.fra1.he.net (216.66.80.30) 19.668 ms 22.168 ms 19.577 ms Wed Jun 12 16:45:04 PDT 2013 C. is there really need for so much noise about a service that isn't critical to anyone anywhere?

An outage is an outage. If you consider this particular outage noise then I suggest you learn how to use the filter functionality of your mail client.

., _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Raman Sud

8:55 a.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

alright guys.. lets take it easy on this before this forum becomes a battle ground ----- Original Message ----- From: "Grant Ridder" <shortdudey123@gmail.com> To: "John Starta" <john@starta.org> Cc: outages@outages.org, "Constantine A. Murenin" <mureninc@gmail.com> Sent: Thursday, June 13, 2013 8:46:34 AM Subject: Re: [outages] IPv6 tunnels in FRA1 on HE.net down? One mans garbage is another mans treasure Sent from my iPhone On Jun 13, 2013, at 8:09 AM, John Starta <john@starta.org> wrote:

...

On Jun 13, 2013, at 5:16, Joe Holden <lists@rewt.org.uk> wrote:

...
Constantine A. Murenin wrote:

...
The /tserv1.fra1.he.net/ seems to be back up, since ~16:41 PT, for a total downtime of almost 2 hours (1:49), since ~14:52 PT. However, the routing is still a bit weird right now, going through London for a 20ms ping (the usual is about 6ms). # traceroute tserv1.fra1.he.net ; date traceroute to tserv1.fra1.he.net (216.66.80.30), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 3.234 ms 0.503 ms 0.446 ms 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) 30.126 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 1.906 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.246 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.978 ms 5.931 ms 5.923 ms 4 noris-gw.hetzner.de (213.239.242.250) 6.405 ms 6.143 ms 6.100 ms 5 xe0-0-1-823-rt1-ldn1.core.noris.net (213.95.0.110) 19.823 ms 19.827 ms 19.844 ms 6 * * * 7 10gigabitethernet4-2.core1.fra1.he.net (184.105.213.146) 22.266 ms 24.706 ms 24.957 ms 8 tserv1.fra1.he.net (216.66.80.30) 19.668 ms 22.168 ms 19.577 ms Wed Jun 12 16:45:04 PDT 2013 C. is there really need for so much noise about a service that isn't critical to anyone anywhere?

An outage is an outage. If you consider this particular outage noise then I suggest you learn how to use the filter functionality of your mail client.

., _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Constantine A. Murenin

12 Jun 12 Jun

3:33 p.m.

New subject: [outages] IPv6 tunnels in FRA1 on HE.net down?

I'm being told by the NOC that someone is on the ground in FRA1, replacing a switch; and that it's a "scheduled maintenance", no ETA, and there was no announcement, since someone has to take a flight to Europe and its unpredictable. :-) 35 minutes of downtime, and counting... C. On 12 June 2013 15:19, Constantine A. Murenin <mureninc@gmail.com> wrote:

...

I smell a big outage:

% date; traceroute tserv1.fra1.he.net Wed Jun 12 15:17:04 PDT 2013 traceroute to tserv1.fra1.he.net (216.66.80.30), 30 hops max, 60 byte packets 1 192.168.105.3 (192.168.105.3) 0.673 ms 0.773 ms 0.923 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (65.49.10.217) 1.795 ms 1.811 ms 1.795 ms 3 10gigabitethernet12-1.core1.lax1.he.net (184.105.213.26) 19.773 ms 10gigabitethernet10-1.core1.sjc2.he.net (184.105.222.14) 0.786 ms 0.756 ms 4 10gigabitethernet10-8.core1.nyc4.he.net (72.52.92.225) 71.714 ms 76.569 ms 10gigabitethernet14-2.core1.nyc4.he.net (184.105.213.198) 71.690 ms 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * *^C

However, a "reverse" traceroute works fine:

% date; traceroute ns1.he.net Wed Jun 12 15:18:47 PDT 2013 traceroute to ns1.he.net (216.218.130.2), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 0.682 ms 0.531 ms 0.481 ms 2 hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) 0.245 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.242 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 0.241 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.951 ms hos-bb1.juniper1.ffm.hetzner.de (213.239.240.224) 4.803 ms 4.786 ms 4 30gigabitethernet4-3.core1.fra1.he.net (80.81.192.172) 6.354 ms 6.81 ms 5.451 ms 5 10gigabitethernet10-2.core1.par2.he.net (72.52.92.26) 22.803 ms 24.531 ms 24.787 ms 6 10gigabitethernet15-1.core1.ash1.he.net (184.105.213.93) 101.504 ms 99.563 ms 99.958 ms 7 10gigabitethernet11-1.core1.pao1.he.net (184.105.213.177) 163.687 ms 171.711 ms 175.34 ms 8 10gigabitethernet1-2.core1.fmt1.he.net (184.105.213.65) 163.940 ms 171.362 ms 167.67 ms 9 ns1.he.net (216.218.130.2) 163.52 ms 165.265 ms 164.143 ms

C.

On 12 June 2013 15:11, Constantine A. Murenin <mureninc@gmail.com> wrote:

...
he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT), this time it seems like the whole thing is down, cannot even ping ordns.he.net over IPv6; a connection of my friend who's running a smokeping is also down, e.g. this is definitely widespread.

Not sure what's exactly down, but it seems to be bgp-related, perhaps:

1 2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1) 0.699 ms 0.828 ms 0.970 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1) 5.936 ms 5.877 ms 5.859 ms 3 10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2) 6.743 ms 6.877 ms 6.978 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * *

C.

4647

Age (days ago)

4697

Last active (days ago)

List overview

Download

14 comments

6 participants

participants (6)

Constantine A. Murenin
Grant Ridder
Joe Holden
John Starta
Raman Sud
Tony McCrory