
A follow-up to this: I received *many* off-list mails about this, but nothing official. Lots of (justified) speculation about which peering points and/or providers it is, to which I did respond to some, but nothing conclusive. Without going into details (respecting folks' privacy), I can say that the NTT/Verio<->Tata peering point has been ruled out as the cause. I should also note that the ingress path (Comcast-->ARP) has changed due to what appears to be some peering adjustments done by Comcast last night (specifically May 27th at 23:22 PDT or thereabouts). This will be obvious below in a pair of live mtrs done by me manually, with DNS resolution enabled. (Someone off-list asked for this) My gut feeling right now is that the issue is at the Comcast<->Tata peering point (hops #5 and #6 in the 2nd mtr), or a Comcast router at One Wilshire is overwhelmed (hop #6 in the 2nd mtr). The reason I'm wanting to rule out ingress (Comcast->ARP) is because the path today looks different than it did yesterday (Comcast<->GTT/nLayer is no longer involved) yet the issue remains, thus something on the return path looks more likely. Anyway, here are the mtrs, taken about 10 minutes ago: Source IP: 67.180.84.87 (Comcast; Mountain View, CA) Dest IP: 206.125.172.42 (ARP Networks; Sylmar, CA) Host Loss% Snt Rcv Last Avg Best Wrst 1. gw.home.lan 0.0% 128 128 0.2 0.2 0.2 0.5 2. c-67-180-84-1.hsd1.ca.comcast.net 0.0% 128 128 25.6 25.9 10.5 66.4 3. te-0-0-0-12-ur05.santaclara.ca.sfba.comcast 0.0% 128 128 10.6 11.4 8.8 15.6 4. te-1-1-0-13-ar01.sfsutro.ca.sfba.comcast.ne 0.0% 128 128 11.3 14.8 10.2 19.7 5. he-3-9-0-0-cr01.sanjose.ca.ibone.comcast.ne 0.0% 128 128 20.3 20.7 12.4 40.6 6. be-13-pe02.11greatoaks.ca.ibone.comcast.net 0.0% 127 127 20.7 18.6 14.5 68.8 7. 173.167.59.82 0.0% 127 127 19.2 23.8 14.3 194.4 8. 63-218-212-14.static.pccwglobal.net 0.0% 127 127 24.8 27.1 23.1 78.8 9. cxa.r6.lax2.trit.net 0.0% 127 127 24.4 25.8 22.6 31.3 10. arpnetworks-lax2-gw.cust.trit.net 1.6% 127 125 40.9 55.1 37.2 210.4 11. omake.koitsu.org 10.2% 127 114 41.8 39.3 35.6 52.5 Source IP: 206.125.172.42 (ARP Networks; Sylmar, CA) Dest IP: 67.180.84.87 (Comcast; Mountain View, CA) Host Loss% Snt Rcv Last Avg Best Wrst 1. 206.125.172.41 0.0% 116 116 7.4 10.0 1.0 111.1 2. s7.lax.arpnetworks.com 0.0% 116 116 0.7 8.0 0.4 168.3 3. ge-0-7-0-24.r04.lsanca03.us.bb.gin.ntt.net 0.0% 116 116 0.7 0.8 0.6 1.5 4. ae-3.r05.lsanca03.us.bb.gin.ntt.net 0.0% 116 116 1.1 1.0 0.8 2.3 5. ix-9-1-2-0.tcore2.LVW-LosAngeles.as6453.net 0.0% 116 116 0.5 2.1 0.4 52.0 ix-11-2-1-0.tcore2.LVW-LosAngeles.as6453.net 6. xe-1-2-0-0-pe01.onewilshire.ca.ibone.comcas 4.3% 116 111 15.7 16.4 15.5 42.9 7. te-2-0-0-7-cr01.losangeles.ca.ibone.comcast 6.0% 116 109 17.8 17.6 15.7 19.9 8. pos-2-9-0-0-cr01.sanjose.ca.ibone.comcast.n 5.2% 116 110 28.8 27.9 25.8 29.9 9. he-0-4-0-0-ar01.sfsutro.ca.sfba.comcast.net 3.4% 116 112 28.0 28.2 25.9 30.0 10. te-0-6-0-0-ur05.santaclara.ca.sfba.comcast. 5.2% 116 110 29.9 29.7 29.1 37.0 11. te-18-10-cdn31.santaclara.ca.sfba.comcast.n 3.4% 116 112 41.7 47.9 31.2 54.3 12. c-67-180-84-87.hsd1.ca.comcast.net 3.4% 116 112 38.1 37.4 35.1 72.2 -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Mon, May 27, 2013 at 07:53:35PM -0700, Jeremy Chadwick wrote:
Hi folks,
For the past 3-4 weeks (likely longer), there has been a peering point in Los Angeles that has been repeatedly overtaxed every night for roughly 5-6 hours. Specifically high latency and packet loss.
It tends to start at roughly the same time every day (1800 PDT), and tends to dissipate around the same time as well (2300 PDT or so) -- i.e. prime-time. This is not ICMP prioritisation -- this is real packet loss (I can tell via SSH, fetching mail (IMAP), etc.). However in recent days (past week or so), the issue has persisted even until hours as late as 0100 PDT, and today (for example) has existed since as early as 1500 PDT. So the situation as I said is getting worse.
Because of asymmetrical routing, it's impossible for me to tell where peering-wise the actual issue is:
a) between Comcast and GTT/nLayer b) between Comcast and Tata/AS6453 c) between NTT/Verio and Tata/AS6453
Of those 4 providers I've only a relationship with one; me getting all four to talk simultaneously, while the issue is happening, is virtually impossible. However it happens every night like clockwork, for a long period of time, so resolution shouldn't be all that difficult.
I do periodic mtrs (think traceroute + ping combined) between my home connection and my VPS. Probe durations are 40 seconds, at intervals of 60 seconds. I store roughly 2-3 months of data, and I can make all this data available.
Here's examples taken from tonight, both directions. Source and destination IPs are provided.
Source IP: 67.180.84.87 (Comcast; Mountain View, CA) Dest IP: 206.125.172.42 (ARP Networks; Sylmar, CA)
=== Mon May 27 19:36:00 PDT 2013 (1369708560) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.3 0.2 0.4 2.|-- 67.180.84.1 0.0% 40 40 11.4 24.9 10.1 61.7 3.|-- 68.85.191.249 0.0% 40 40 11.3 11.7 8.9 27.7 4.|-- 69.139.199.106 0.0% 40 40 14.6 13.3 10.4 16.1 5.|-- 68.86.91.45 0.0% 40 40 15.9 19.6 13.1 34.3 6.|-- 68.86.88.58 0.0% 40 40 25.0 22.8 20.0 27.2 7.|-- 68.86.88.190 0.0% 40 40 21.5 21.9 19.6 38.2 8.|-- 173.167.57.138 10.0% 40 36 68.5 79.4 54.1 92.3 9.|-- 69.31.127.141 0.0% 40 40 81.4 95.1 64.5 128.8 10.|-- 69.31.127.130 35.0% 40 26 69.5 69.6 43.0 96.1 11.|-- 69.174.121.73 0.0% 40 40 67.0 72.3 39.6 90.4 12.|-- 67.199.135.102 10.0% 40 36 39.1 49.2 36.7 249.8 13.|-- 206.125.172.42 5.0% 40 38 37.1 36.0 34.2 49.9 === END
DNS resolution for relevant hops:
$ host 68.86.88.190 190.88.86.68.in-addr.arpa domain name pointer pos-0-4-0-0-pe01.600wseventh.ca.ibone.comcast.net.
$ host 173.167.57.138 Host 138.57.167.173.in-addr.arpa. not found: 3(NXDOMAIN)
$ host 69.31.127.141 141.127.31.69.in-addr.arpa domain name pointer ae0-110g.cr1.lax1.us.nlayer.net.
Source IP: 206.125.172.42 (ARP Networks; Sylmar, CA) Dest IP: 67.180.84.87 (Comcast; Mountain View, CA)
=== Mon May 27 19:37:00 PDT 2013 (1369708620) HOST: omake.koitsu.org Loss% Snt Rcv Last Avg Best Wrst 1.|-- 206.125.172.41 0.0% 40 40 1.5 12.8 0.9 181.8 2.|-- 208.79.88.135 0.0% 40 40 0.6 1.0 0.5 11.6 3.|-- 129.250.198.185 0.0% 40 40 0.8 0.9 0.7 1.4 4.|-- 129.250.2.221 0.0% 40 40 1.0 1.0 0.9 1.5 5.|-- 64.86.252.65 32.5% 40 27 0.5 0.8 0.5 8.6 | `|-- 216.6.84.65 6.|-- 173.167.59.185 55.0% 40 18 19.6 19.3 15.7 75.1 7.|-- 68.86.82.61 40.0% 40 24 17.9 17.5 15.9 19.5 8.|-- 68.86.88.57 5.0% 40 38 25.2 25.7 24.1 27.5 9.|-- 68.86.90.94 7.5% 40 37 25.8 26.6 24.7 28.7 10.|-- 69.139.198.81 5.0% 40 38 26.6 26.6 26.3 27.0 11.|-- 68.85.191.254 7.5% 40 37 47.9 46.0 30.5 50.5 12.|-- 67.180.84.87 7.5% 40 37 36.0 36.1 34.0 50.3 === END
DNS resolution for relevant hops (hop #5 indicates round-robin'ing between two different interfaces presumably on the same Tata/AS6453 router):
$ host 129.250.2.221 221.2.250.129.in-addr.arpa domain name pointer ae-3.r05.lsanca03.us.bb.gin.ntt.net.
$ host 64.86.252.65 65.252.86.64.in-addr.arpa domain name pointer ix-11-2-1-0.tcore2.LVW-LosAngeles.as6453.net.
$ host 216.6.84.65 65.84.6.216.in-addr.arpa domain name pointer ix-9-1-2-0.tcore2.LVW-LosAngeles.as6453.net.
$ host 173.167.59.185 185.59.167.173.in-addr.arpa domain name pointer xe-1-2-0-0-pe01.onewilshire.ca.ibone.comcast.net.
If relevant providers are on this list and can respond, that would be awesome. If they can't respond but can instead start the ball rolling on rectifying this, I think myself and many other 'netizens would really appreciate it.
-- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages