
Indeed, traceroute to my fra1 tunnel is definately getting mislaid within he.net. Time to ask for a refund? oh wait... ;) Tracing route to tmcc.me [2001:470:1f0a:3ee::2] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 6.9.1.2.2.d.e.f.f.f.f.f.6.7.a.0.3.f.7.9.d.8.4.6.0.b.8.0.1.0.0.2.ip6.arpa [2001:8b0:648d:97f3:a76:ffff:fed2:2196] 2 65 ms 65 ms 66 ms c.gormless.thn.aa.net.uk [2001:8b0:0:53::53] 3 56 ms 58 ms 61 ms 2001:7f8:4::50e8:1 4 70 ms 74 ms 74 ms 40gigabitethernet1-1.core1.lon1.he.net[2001:7f8:4::1b1b:1] 5 145 ms 141 ms 132 ms 10gigabitethernet10-4.core1.nyc4.he.net[2001:470:0:128::1] 6 146 ms 150 ms 154 ms 100gigabitethernet7-2.core1.chi1.he.net[2001:470:0:298::1] 7 211 ms 216 ms 212 ms 10gigabitethernet11-4.core1.pao1.he.net[2001:470:0:283::1] 8 * * * Request timed out. 9 * * * Request timed out. 10 * * * Request timed out. On 12 June 2013 23:19, Constantine A. Murenin <mureninc@gmail.com> wrote:
I smell a big outage:
% date; traceroute tserv1.fra1.he.net Wed Jun 12 15:17:04 PDT 2013 traceroute to tserv1.fra1.he.net (216.66.80.30), 30 hops max, 60 byte packets 1 192.168.105.3 (192.168.105.3) 0.673 ms 0.773 ms 0.923 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (65.49.10.217) 1.795 ms 1.811 ms 1.795 ms 3 10gigabitethernet12-1.core1.lax1.he.net (184.105.213.26) 19.773 ms 10gigabitethernet10-1.core1.sjc2.he.net (184.105.222.14) 0.786 ms 0.756 ms 4 10gigabitethernet10-8.core1.nyc4.he.net (72.52.92.225) 71.714 ms 76.569 ms 10gigabitethernet14-2.core1.nyc4.he.net (184.105.213.198) 71.690 ms 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * *^C
However, a "reverse" traceroute works fine:
% date; traceroute ns1.he.net Wed Jun 12 15:18:47 PDT 2013 traceroute to ns1.he.net (216.218.130.2), 64 hops max, 40 byte packets 1 static.33.203.4.46.clients.your-server.de (46.4.203.33) 0.682 ms 0.531 ms 0.481 ms 2 hos-tr4.juniper2.rz13.hetzner.de (213.239.224.97) 0.245 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) 0.242 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) 0.241 ms 3 hos-bb2.juniper4.ffm.hetzner.de (213.239.240.150) 5.951 ms hos-bb1.juniper1.ffm.hetzner.de (213.239.240.224) 4.803 ms 4.786 ms 4 30gigabitethernet4-3.core1.fra1.he.net (80.81.192.172) 6.354 ms 6.81 ms 5.451 ms 5 10gigabitethernet10-2.core1.par2.he.net (72.52.92.26) 22.803 ms 24.531 ms 24.787 ms 6 10gigabitethernet15-1.core1.ash1.he.net (184.105.213.93) 101.504 ms 99.563 ms 99.958 ms 7 10gigabitethernet11-1.core1.pao1.he.net (184.105.213.177) 163.687 ms 171.711 ms 175.34 ms 8 10gigabitethernet1-2.core1.fmt1.he.net (184.105.213.65) 163.940 ms 171.362 ms 167.67 ms 9 ns1.he.net (216.218.130.2) 163.52 ms 165.265 ms 164.143 ms
C.
On 12 June 2013 15:11, Constantine A. Murenin <mureninc@gmail.com> wrote:
he.net fra1 tserv1 is down since about 10 minutes ago (~14:57 PT), this time it seems like the whole thing is down, cannot even ping ordns.he.net over IPv6; a connection of my friend who's running a smokeping is also down, e.g. this is definitely widespread.
Not sure what's exactly down, but it seems to be bgp-related, perhaps:
1 2600:3c01::8678:acff:fe0d:79c1 (2600:3c01::8678:acff:fe0d:79c1) 0.699 ms 0.828 ms 0.970 ms 2 10gigabitethernet7-6.core3.fmt2.he.net (2001:470:1:3b8::1) 5.936 ms 5.877 ms 5.859 ms 3 10gigabitethernet5-4.core1.pao1.he.net (2001:470:0:263::2) 6.743 ms 6.877 ms 6.978 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * *
C.
On 14 May 2013 15:50, Constantine A. Murenin <mureninc@gmail.com> wrote:
For what it is worth, further details about the issue have surfaced. I found a friend who also has a tunnel on tserv1.fra1.he.net., and he has been running smokeping to various IPv4 and IPv6 resources for quite a while.
According to several of his smokeping reports, it can be concluded that this very outage occurred during 14T18:00/05 and 14T18:45/50; but we've also noticed that there was another, 6 hour (yes, 6 hour) outage a day earlier, ~13T12 to ~13T18 (which corresponds to Monday early to late morning Pacific Time).
I've contacted he.net again this time around, and they said that they're trying to hunt some obscure kernel bug that is causing these issues.
The tunnelbroker.net is a free service, but to have a 6 hour outage, clearly spanning 1/4th of a whole day, is absolutely ridiculous. I'm stunned that IPv6 connectivity of tserv1.fra1.he.net. is, apparently, still not monitored, even though it's known to be having these issues. ???
Alternatively, it is, of course, possible that some engineer has been troubleshooting the root cause of this issue for those whole 5 or 6 hours on Sunday/Monday night; but I find that somewhat hard to believe; more like it got busted, and noone responsible knew about it being busted for most of the time that it was.
Even more troubling, is that they don't even publish any reports about these extended outages.
For tserv1.fra1.he.net. end users: if you can `ping6 ordns.he.net` (it runs on tserv itself, try $(host `dig +short -6 @ordns.he.net whoami.akamai.net`)), but cannot `ping6 ns4.he.net`, then it most likely means that tserv1.fra1.he.net IPv6-connectivity is down again, and you must open a ticket with HE.net ASAP. Perhaps someone should setup a smokeping with automatic emails to support@he.net?
C.
On 14 May 2013 09:42, Constantine A. Murenin <mureninc@gmail.com> wrote:
This just happened again: all IPv6 tunnels on tserv1.fra1.he.net. were inaccessible; from within a tunnel, cannot access any IPv6 resource, other than ordns.he.net, which runs on the tunnel server itself.
Why does FRA1 loses IPv6 connectivity so often?
Update: Seems like it has been resolved as I've been writing this email, but this would seem to happen a few times too many.
C.
On 23 April 2013 19:28, Constantine A. Murenin <mureninc@gmail.com> wrote:
On 23 April 2013 18:28, Constantine A. Murenin <mureninc@gmail.com> wrote:
As of a couple of minutes ago, my IPv6 tunnel seems to have no connectivity, weirdly other than being able to access ordns.he.net (2001:470:20::2) just fine, but not ns{2,3,4,5}.he.net, or any other IPv6 host.
Update: reported to he.net support@ on 18:34, including a follow-up phone call; everything's back online, as of at least 18:51 PT. Pretty fast resolution, for a free service. :-)
According to he.net, tserv wasn't responding on its IPv6 address, and has henceforth been rebooted.
Which adds up as per my mtr from a Linode:
# mtr --report{,-wide,-cycles=60} --order "SRL BGAWV" -6 XXXXXXXXX ; date 2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.5 8.5 85.5 15.2 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.5 5.3 51.6 8.5 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 31.7 32.3 70.0 7.3 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.6 44.3 114.3 10.1 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 56.2 57.4 177.2 17.0 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 59 1.7% 69.1 72.2 72.5 119.8 7.1 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 141.8 142.2 229.2 12.0 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 154.0 154.4 242.8 12.7 10. ??? 60 0 100.0 0.0 0.0 0.0 0.0 0.0 Tue Apr 23 18:14:29 PDT 2013
...
2. 10gigabitethernet2-3.core1.fmt1.he.net 60 60 0.0% 0.4 2.8 9.3 78.3 16.8 3. 10gigabitethernet1-2.core1.sjc2.he.net 60 60 0.0% 0.8 2.3 3.9 13.9 3.8 4. 10gigabitethernet3-3.core1.den1.he.net 60 60 0.0% 27.8 30.3 30.5 39.2 3.5 5. 10gigabitethernet5-5.core1.mci3.he.net 60 60 0.0% 39.7 43.4 43.6 52.8 4.4 6. 10gigabitethernet5-2.core1.chi1.he.net 60 60 0.0% 52.0 54.1 54.2 62.2 3.2 7. 100gigabitethernet7-2.core1.nyc4.he.net 60 60 0.0% 69.1 71.5 71.6 82.1 3.5 8. 10gigabitethernet1-2.core1.lon1.he.net 60 60 0.0% 137.8 140.3 140.4 149.1 3.5 9. 10gigabitethernet4-2.core1.fra1.he.net 60 60 0.0% 149.4 152.3 152.3 162.5 3.7 10. tserv1.fra1.he.net 60 60 0.0% 154.1 155.5 155.5 163.4 2.2 11. IPv6.XXXXXX 60 59 1.7% 155.2 156.0 156.1 162.9 1.2 Tue Apr 23 18:55:34 PDT 2013
And I guess ordns.he.net (2001:470:20::2) really runs on tserv (and hence wasn't affected during the outage).
Cns# echo {ordns,ns{2,3,4,5}}.he.net | xargs -n1 traceroute6 -l; date traceroute6 to ordns.he.net (2001:470:20::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 ordns.he.net (2001:470:20::2) 6.39 ms 6.495 ms 6.213 ms traceroute6 to ns2.he.net (2001:470:200::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 9.353 ms 8.783 ms 9.328 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 13.12 ms 15.134 ms 6.252 ms 3 10gigabitethernet5-3.core1.lon1.he.net (2001:470:0:1d2::1) 18.113 ms 23.554 ms 20.342 ms 4 ns2.he.net (2001:470:200::2) 20.449 ms 22.873 ms 20.517 ms traceroute6 to ns3.he.net (2001:470:300::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.743 ms 9.2 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 5.934 ms 10.749 ms 6.197 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 18.567 ms 17.728 ms 13.282 ms 4 ns3.he.net (2001:470:300::2) 13.462 ms 13.412 ms 13.525 ms traceroute6 to ns4.he.net (2001:470:400::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.838 ms 8.684 ms 8.682 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 6.233 ms 13.208 ms 5.97 ms 3 ns4.he.net (2001:470:400::2) 6.145 ms 6.38 ms 6.384 ms traceroute6 to ns5.he.net (2001:470:500::2) from 2001:470:XXXX:YYYY::2, 64 hops max, 12 byte packets 1 * XXXXXX.tunnel.tserv6.fra1.ipv6.he.net (2001:470:XXXX:YYYY::1) 8.655 ms 8.848 ms 2 gige-g2-20.core1.fra1.he.net (2001:470:0:69::1) 12.607 ms 6.527 ms 11.929 ms 3 10gigabitethernet5-3.core1.ams1.he.net (2001:470:0:225::1) 13.289 ms 13.968 ms 16.754 ms 4 ns5.he.net (2001:470:500::2) 14.111 ms 13.575 ms 13.385 ms Tue Apr 23 18:59:48 PDT 2013
However, it's unclear why IPv6 connectivity of tserv1.fra1.he.net doesn't seem to be monitored otherwise. :/
Cheers, Constantine.
-- В. В. Путин о совершенстве, 24 декабря 2000 года: Если человека все устраивает, то он полный идиот. Здорового человека в нормальной памяти не может всегда и всё устраивать.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages