comcast/sprint oddities

I have seen this the past few nights in a row, however based on a problem we have been having with a customer VPN im thinking its been going on longer. Tracert section below. Hop 13 has been really high latency for me during issues. When its acting up, latency between home and work is 160-200ms. At 6am this morning, it was 60ms and was consistent all day until 530pm today according to smokeping. 5 19 ms 20 ms 20 ms te-0-10-0-13-ar02.aurora.co.denver.comcast.net [68.86.103.82] 6 23 ms 16 ms 16 ms he-3-9-0-0-cr01.denver.co.ibone.comcast.net [68.86.92.21] 7 53 ms 62 ms 31 ms 68.86.89.190 8 34 ms 30 ms 29 ms pos-0-0-0-0-pe01.1950stemmons.tx.ibone.comcast.net [68.86.86.90] 9 35 ms 31 ms 29 ms sl-st31-dal-.sprintlink.net [144.232.25.33] 10 31 ms 31 ms 32 ms 144.232.11.207 11 86 ms 86 ms 30 ms 144.232.1.162 12 42 ms 45 ms 40 ms sl-crs2-kc-0-5-5-0.sprintlink.net [144.232.24.7] 13 550 ms 110 ms 186 ms 144.232.1.101 14 193 ms 300 ms 119 ms sl-crs2-che-0-0-2-0.sprintlink.net [144.232.18.120] 15 114 ms 102 ms 104 ms sl-gw16-che-15-0-0.sprintlink.net [144.232.6.50] Anyone seeing something similar or anyone insightful from either organization who might be able to help out? Thanks, Blake

Blake, 1. You need to provide traceroutes from both endpoints, given the very likely possibility of asymmetric routing, 2. You need to provide both source and destination IP addresses, 3. Please make sure your destinations are not routers -- they need to be actual hosts. If network or host ACLs/firewalls inhibit the ability to reach the destinations, that can make things a bit more difficult given the possibility of ICMP prioritisation. I say this with acknowledgement of your statement that the same hop tends to show 60ms during normal hours but increases during evenings. Necessary details/specifics are covered here: http://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N4... Thanks. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Wed, Mar 13, 2013 at 02:50:47AM +0000, Blake Pfankuch - Mailing List wrote:
I have seen this the past few nights in a row, however based on a problem we have been having with a customer VPN im thinking its been going on longer. Tracert section below. Hop 13 has been really high latency for me during issues. When its acting up, latency between home and work is 160-200ms. At 6am this morning, it was 60ms and was consistent all day until 530pm today according to smokeping. 5 19 ms 20 ms 20 ms te-0-10-0-13-ar02.aurora.co.denver.comcast.net [68.86.103.82] 6 23 ms 16 ms 16 ms he-3-9-0-0-cr01.denver.co.ibone.comcast.net [68.86.92.21] 7 53 ms 62 ms 31 ms 68.86.89.190 8 34 ms 30 ms 29 ms pos-0-0-0-0-pe01.1950stemmons.tx.ibone.comcast.net [68.86.86.90] 9 35 ms 31 ms 29 ms sl-st31-dal-.sprintlink.net [144.232.25.33] 10 31 ms 31 ms 32 ms 144.232.11.207 11 86 ms 86 ms 30 ms 144.232.1.162 12 42 ms 45 ms 40 ms sl-crs2-kc-0-5-5-0.sprintlink.net [144.232.24.7] 13 550 ms 110 ms 186 ms 144.232.1.101 14 193 ms 300 ms 119 ms sl-crs2-che-0-0-2-0.sprintlink.net [144.232.18.120] 15 114 ms 102 ms 104 ms sl-gw16-che-15-0-0.sprintlink.net [144.232.6.50]
Anyone seeing something similar or anyone insightful from either organization who might be able to help out?
Thanks, Blake
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Sorry about that. The reflexive tracert is below. I would rather not list source and destination addresses on a public list. These are directly connected hosts on each side, not routers, no ICMP prioritization for sure on the Sprint peered side, however I am at the mercy of Comcast Business on the other side... 4 16 ms 7 ms 5 ms sl-gw16-che-1-2-0.sprintlink.net [160.81.219.133] 5 5 ms 6 ms 6 ms sl-crs2-che-0-4-2-3.sprintlink.net [144.232.6.51] 6 16 ms 15 ms 16 ms sl-crs2-oma-0-2-2-0.sprintlink.net [144.232.18.121] 7 26 ms 25 ms 25 ms 144.232.1.74 8 26 ms 25 ms 25 ms 144.232.1.104 9 26 ms 26 ms 26 ms 144.232.6.102 10 32 ms 35 ms 35 ms be-12-cr01.350ecermak.il.ibone.comcast.net [68.86.84.189] 11 49 ms 51 ms 51 ms he-1-12-0-0-cr01.denver.co.ibone.comcast.net [68.86.85.250] 12 58 ms 59 ms 59 ms he-0-9-0-0-ar02.aurora.co.denver.comcast.net [68.86.90.150] This trace has first 3 and last 4 hops removed. Previous trace has first 4 and last 2 hops removed. Definitely looks like something funky, and was hoping someone might be able give me a little insight on where to go from here. -----Original Message----- From: Jeremy Chadwick [mailto:jdc@koitsu.org] Sent: Tuesday, March 12, 2013 9:02 PM To: Blake Pfankuch - Mailing List Cc: outages@outages.org Subject: Re: [outages] comcast/sprint oddities Blake, 1. You need to provide traceroutes from both endpoints, given the very likely possibility of asymmetric routing, 2. You need to provide both source and destination IP addresses, 3. Please make sure your destinations are not routers -- they need to be actual hosts. If network or host ACLs/firewalls inhibit the ability to reach the destinations, that can make things a bit more difficult given the possibility of ICMP prioritisation. I say this with acknowledgement of your statement that the same hop tends to show 60ms during normal hours but increases during evenings. Necessary details/specifics are covered here: http://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N4... Thanks. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Wed, Mar 13, 2013 at 02:50:47AM +0000, Blake Pfankuch - Mailing List wrote:
I have seen this the past few nights in a row, however based on a problem we have been having with a customer VPN im thinking its been going on longer. Tracert section below. Hop 13 has been really high latency for me during issues. When its acting up, latency between home and work is 160-200ms. At 6am this morning, it was 60ms and was consistent all day until 530pm today according to smokeping. 5 19 ms 20 ms 20 ms te-0-10-0-13-ar02.aurora.co.denver.comcast.net [68.86.103.82] 6 23 ms 16 ms 16 ms he-3-9-0-0-cr01.denver.co.ibone.comcast.net [68.86.92.21] 7 53 ms 62 ms 31 ms 68.86.89.190 8 34 ms 30 ms 29 ms pos-0-0-0-0-pe01.1950stemmons.tx.ibone.comcast.net [68.86.86.90] 9 35 ms 31 ms 29 ms sl-st31-dal-.sprintlink.net [144.232.25.33] 10 31 ms 31 ms 32 ms 144.232.11.207 11 86 ms 86 ms 30 ms 144.232.1.162 12 42 ms 45 ms 40 ms sl-crs2-kc-0-5-5-0.sprintlink.net [144.232.24.7] 13 550 ms 110 ms 186 ms 144.232.1.101 14 193 ms 300 ms 119 ms sl-crs2-che-0-0-2-0.sprintlink.net [144.232.18.120] 15 114 ms 102 ms 104 ms sl-gw16-che-15-0-0.sprintlink.net [144.232.6.50]
Anyone seeing something similar or anyone insightful from either organization who might be able to help out?
Thanks, Blake
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Was the below Sprint->Comcast traceroute taken while the issue was/is still actively happening? You should probably do multiple traceroutes from both endpoints, or get better tools like mtr/WinMTR which can do this repeatedly. What you've shown below indicates that you're only seeing an increase in latency on the ingress side of the Sprint connection (i.e. Comcast->Sprint), starting at one specific router (hop #13) on the Sprint side. If this was an actual service-impacting issue, you would be seeing an increase in latency on the egress (Sprint->Comcast) traceroute as well, given that traceroute attempts to solicit an ICMP time-exceeded message from a device; think round-trip-time and how response packets "should" take the same path that your Comcast->Sprint traceroute would. Phrased differently: if the issue is actively happening, you should be seeing "spiky" latency in both the Comcast->Sprint and Sprint->Comcast traceroutes, just at different hops/network points. What you've shown below refutes that, which to me indicates ICMP prioritisation being used on Sprint routers. If you really wanted to bring this up with Sprint, you would need to have a relationship with them, and you would *need* to provide source and destination IPs. I would recommend setting up scripts/tools that run mtr from a cronjob and save the output to a file, and do this on hosts at both ends of the network (the Comcast end, and the Sprint end). -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Wed, Mar 13, 2013 at 03:18:58AM +0000, Blake Pfankuch - Mailing List wrote:
Sorry about that. The reflexive tracert is below. I would rather not list source and destination addresses on a public list. These are directly connected hosts on each side, not routers, no ICMP prioritization for sure on the Sprint peered side, however I am at the mercy of Comcast Business on the other side...
4 16 ms 7 ms 5 ms sl-gw16-che-1-2-0.sprintlink.net [160.81.219.133] 5 5 ms 6 ms 6 ms sl-crs2-che-0-4-2-3.sprintlink.net [144.232.6.51] 6 16 ms 15 ms 16 ms sl-crs2-oma-0-2-2-0.sprintlink.net [144.232.18.121] 7 26 ms 25 ms 25 ms 144.232.1.74 8 26 ms 25 ms 25 ms 144.232.1.104 9 26 ms 26 ms 26 ms 144.232.6.102 10 32 ms 35 ms 35 ms be-12-cr01.350ecermak.il.ibone.comcast.net [68.86.84.189] 11 49 ms 51 ms 51 ms he-1-12-0-0-cr01.denver.co.ibone.comcast.net [68.86.85.250] 12 58 ms 59 ms 59 ms he-0-9-0-0-ar02.aurora.co.denver.comcast.net [68.86.90.150]
This trace has first 3 and last 4 hops removed. Previous trace has first 4 and last 2 hops removed. Definitely looks like something funky, and was hoping someone might be able give me a little insight on where to go from here.
-----Original Message----- From: Jeremy Chadwick [mailto:jdc@koitsu.org] Sent: Tuesday, March 12, 2013 9:02 PM To: Blake Pfankuch - Mailing List Cc: outages@outages.org Subject: Re: [outages] comcast/sprint oddities
Blake,
1. You need to provide traceroutes from both endpoints, given the very likely possibility of asymmetric routing,
2. You need to provide both source and destination IP addresses,
3. Please make sure your destinations are not routers -- they need to be actual hosts. If network or host ACLs/firewalls inhibit the ability to reach the destinations, that can make things a bit more difficult given the possibility of ICMP prioritisation. I say this with acknowledgement of your statement that the same hop tends to show 60ms during normal hours but increases during evenings.
Necessary details/specifics are covered here:
http://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N4...
Thanks.
-- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Wed, Mar 13, 2013 at 02:50:47AM +0000, Blake Pfankuch - Mailing List wrote:
I have seen this the past few nights in a row, however based on a problem we have been having with a customer VPN im thinking its been going on longer. Tracert section below. Hop 13 has been really high latency for me during issues. When its acting up, latency between home and work is 160-200ms. At 6am this morning, it was 60ms and was consistent all day until 530pm today according to smokeping. 5 19 ms 20 ms 20 ms te-0-10-0-13-ar02.aurora.co.denver.comcast.net [68.86.103.82] 6 23 ms 16 ms 16 ms he-3-9-0-0-cr01.denver.co.ibone.comcast.net [68.86.92.21] 7 53 ms 62 ms 31 ms 68.86.89.190 8 34 ms 30 ms 29 ms pos-0-0-0-0-pe01.1950stemmons.tx.ibone.comcast.net [68.86.86.90] 9 35 ms 31 ms 29 ms sl-st31-dal-.sprintlink.net [144.232.25.33] 10 31 ms 31 ms 32 ms 144.232.11.207 11 86 ms 86 ms 30 ms 144.232.1.162 12 42 ms 45 ms 40 ms sl-crs2-kc-0-5-5-0.sprintlink.net [144.232.24.7] 13 550 ms 110 ms 186 ms 144.232.1.101 14 193 ms 300 ms 119 ms sl-crs2-che-0-0-2-0.sprintlink.net [144.232.18.120] 15 114 ms 102 ms 104 ms sl-gw16-che-15-0-0.sprintlink.net [144.232.6.50]
Anyone seeing something similar or anyone insightful from either organization who might be able to help out?
Thanks, Blake
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Yes the issue is actively occurring right now. It's not consistent, I experience a smooth connection with blips of lag. Where I am seeing this most frequently is traffic across multiple ipsec VPN's. We doo have a peering agreement with Sprint. That's why I emailed out, as normally I would expect to see latency on both directions. I can most definitely provide them with source and destination IP's, I was mostly just hoping someone else might have seen something similar. I do have smokeping running from Comcast to Sprint at the moment and it does show plenty of jitter and latency increase, however I do not have smokeping set up on the other side yet. Thanks for the insight, I will get in touch with our Sprint contacts to see what they think. Thanks, Blake -----Original Message----- From: Jeremy Chadwick [mailto:jdc@koitsu.org] Sent: Tuesday, March 12, 2013 9:34 PM To: Blake Pfankuch - Mailing List Cc: outages@outages.org Subject: Re: [outages] comcast/sprint oddities Was the below Sprint->Comcast traceroute taken while the issue was/is still actively happening? You should probably do multiple traceroutes from both endpoints, or get better tools like mtr/WinMTR which can do this repeatedly. What you've shown below indicates that you're only seeing an increase in latency on the ingress side of the Sprint connection (i.e. Comcast->Sprint), starting at one specific router (hop #13) on the Sprint side. If this was an actual service-impacting issue, you would be seeing an increase in latency on the egress (Sprint->Comcast) traceroute as well, given that traceroute attempts to solicit an ICMP time-exceeded message from a device; think round-trip-time and how response packets "should" take the same path that your Comcast->Sprint traceroute would. Phrased differently: if the issue is actively happening, you should be seeing "spiky" latency in both the Comcast->Sprint and Sprint->Comcast traceroutes, just at different hops/network points. What you've shown below refutes that, which to me indicates ICMP prioritisation being used on Sprint routers. If you really wanted to bring this up with Sprint, you would need to have a relationship with them, and you would *need* to provide source and destination IPs. I would recommend setting up scripts/tools that run mtr from a cronjob and save the output to a file, and do this on hosts at both ends of the network (the Comcast end, and the Sprint end). -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Wed, Mar 13, 2013 at 03:18:58AM +0000, Blake Pfankuch - Mailing List wrote:
Sorry about that. The reflexive tracert is below. I would rather not list source and destination addresses on a public list. These are directly connected hosts on each side, not routers, no ICMP prioritization for sure on the Sprint peered side, however I am at the mercy of Comcast Business on the other side...
4 16 ms 7 ms 5 ms sl-gw16-che-1-2-0.sprintlink.net [160.81.219.133] 5 5 ms 6 ms 6 ms sl-crs2-che-0-4-2-3.sprintlink.net [144.232.6.51] 6 16 ms 15 ms 16 ms sl-crs2-oma-0-2-2-0.sprintlink.net [144.232.18.121] 7 26 ms 25 ms 25 ms 144.232.1.74 8 26 ms 25 ms 25 ms 144.232.1.104 9 26 ms 26 ms 26 ms 144.232.6.102 10 32 ms 35 ms 35 ms be-12-cr01.350ecermak.il.ibone.comcast.net [68.86.84.189] 11 49 ms 51 ms 51 ms he-1-12-0-0-cr01.denver.co.ibone.comcast.net [68.86.85.250] 12 58 ms 59 ms 59 ms he-0-9-0-0-ar02.aurora.co.denver.comcast.net [68.86.90.150]
This trace has first 3 and last 4 hops removed. Previous trace has first 4 and last 2 hops removed. Definitely looks like something funky, and was hoping someone might be able give me a little insight on where to go from here.
-----Original Message----- From: Jeremy Chadwick [mailto:jdc@koitsu.org] Sent: Tuesday, March 12, 2013 9:02 PM To: Blake Pfankuch - Mailing List Cc: outages@outages.org Subject: Re: [outages] comcast/sprint oddities
Blake,
1. You need to provide traceroutes from both endpoints, given the very likely possibility of asymmetric routing,
2. You need to provide both source and destination IP addresses,
3. Please make sure your destinations are not routers -- they need to be actual hosts. If network or host ACLs/firewalls inhibit the ability to reach the destinations, that can make things a bit more difficult given the possibility of ICMP prioritisation. I say this with acknowledgement of your statement that the same hop tends to show 60ms during normal hours but increases during evenings.
Necessary details/specifics are covered here:
http://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Tracero ute_N47_Sun.pdf
Thanks.
-- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Wed, Mar 13, 2013 at 02:50:47AM +0000, Blake Pfankuch - Mailing List wrote:
I have seen this the past few nights in a row, however based on a problem we have been having with a customer VPN im thinking its been going on longer. Tracert section below. Hop 13 has been really high latency for me during issues. When its acting up, latency between home and work is 160-200ms. At 6am this morning, it was 60ms and was consistent all day until 530pm today according to smokeping. 5 19 ms 20 ms 20 ms te-0-10-0-13-ar02.aurora.co.denver.comcast.net [68.86.103.82] 6 23 ms 16 ms 16 ms he-3-9-0-0-cr01.denver.co.ibone.comcast.net [68.86.92.21] 7 53 ms 62 ms 31 ms 68.86.89.190 8 34 ms 30 ms 29 ms pos-0-0-0-0-pe01.1950stemmons.tx.ibone.comcast.net [68.86.86.90] 9 35 ms 31 ms 29 ms sl-st31-dal-.sprintlink.net [144.232.25.33] 10 31 ms 31 ms 32 ms 144.232.11.207 11 86 ms 86 ms 30 ms 144.232.1.162 12 42 ms 45 ms 40 ms sl-crs2-kc-0-5-5-0.sprintlink.net [144.232.24.7] 13 550 ms 110 ms 186 ms 144.232.1.101 14 193 ms 300 ms 119 ms sl-crs2-che-0-0-2-0.sprintlink.net [144.232.18.120] 15 114 ms 102 ms 104 ms sl-gw16-che-15-0-0.sprintlink.net [144.232.6.50]
Anyone seeing something similar or anyone insightful from either organization who might be able to help out?
Thanks, Blake
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

----- Original Message -----
From: "Jeremy Chadwick" <jdc@koitsu.org>
I would recommend setting up scripts/tools that run mtr from a cronjob and save the output to a file, and do this on hosts at both ends of the network (the Comcast end, and the Sprint end).
And the recommended approach is: # mtr -rwc 10 $HOST # mtr -rwcn 10 $HOST (Alas, mtr won't show both the name and IP, even in report modes; I will eventually get sick enough of that to patch it. Yes, local misconfigs that cause bad DNS resolution need to be visible in such report output.) Note that mtr needs to be both setUID root *and* executable by Other in order to work for others than root, assuming you need that. Neither is default. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274

On 13/03/2013, Jay Ashworth <jra@baylink.com> wrote:
----- Original Message -----
From: "Jeremy Chadwick" <jdc@koitsu.org>
I would recommend setting up scripts/tools that run mtr from a cronjob and save the output to a file, and do this on hosts at both ends of the network (the Comcast end, and the Sprint end).
And the recommended approach is:
# mtr -rwc 10 $HOST # mtr -rwcn 10 $HOST
(Alas, mtr won't show both the name and IP, even in report modes; I will eventually get sick enough of that to patch it. Yes, local misconfigs that cause bad DNS resolution need to be visible in such report output.)
Note that mtr needs to be both setUID root *and* executable by Other in order to work for others than root, assuming you need that. Neither is default.
you must have a weird OS; on mine, it's the default: -r-sr-xr-x 1 root bin 63016 Jul 31 2012 /usr/local/sbin/mtr Also, any patches to make `--order "SRL BGAWV"` the default when in `--report` mode? Otherwise, the output is pretty much unreadable. C.

On Wed, Mar 13, 2013 at 8:54 AM, Jay Ashworth <jra@baylink.com> wrote:
Note that mtr needs to be both setUID root *and* executable by Other in order to work for others than root, assuming you need that. Neither is default.
Or even better, just use setcap to give it permission to open RAW sockets without having full root access. setcap 'cap_net_raw=+ep' /usr/bin/mtr That said, I've never seen an OS that included mtr that didn't have it setuid by defualt. Scott

----- Original Message -----
From: "Scott Howard" <scott@doc.net.au>
On Wed, Mar 13, 2013 at 8:54 AM, Jay Ashworth <jra@baylink.com> wrote:
Note that mtr needs to be both setUID root *and* executable by Other in order to work for others than root, assuming you need that. Neither is default.
Or even better, just use setcap to give it permission to open RAW sockets without having full root access.
setcap 'cap_net_raw=+ep' /usr/bin/mtr
Fair point.
That said, I've never seen an OS that included mtr that didn't have it setuid by defualt.
I installed it on OpenSUSE 12.1 on my laptop from... mtr-0.81-3.1.2.i586.rpm which came from the default repo-oss, and on reinstalling it for check, I see that it's root:dialout 4750. So yes, it has SUID, just not the other permissions. That's likely not an accident, but all the machines I manage, I'm the only real user on, generally. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274
participants (5)
-
Blake Pfankuch - Mailing List
-
Constantine A. Murenin
-
Jay Ashworth
-
Jeremy Chadwick
-
Scott Howard