Re: [outages] Comcast<->AT&T packet loss (possibly within CA)

8 Mar 2014

      Jeremy,

For the record, we at Internap do take connectivity issues seriously.

I'd suggest having your provider reach out to our NOC, so that we may
investigate comprehensively.  You (and others on this list) are, of
course, welcome to mail me privately in addition.

(FWIW: a quick look into 76.96.0.0/11 in Dallas shows we've not been
routing to it over any of Comcast's congested paths.)

Regards,
-a

On Sat, Mar 8, 2014 at 10:52 PM, Jeremy Chadwick <jdc@koitsu.org> wrote:
...
1. Thanks -- the problem is that in my experience Company X will blame
Company Y for the device, but the device is owned/maintained by Company
X, and this nonsense goes on for about a week before someone finally
owns up to something (by which time the problem is usually gone).  It's
a depressing and sad modus operandi; sometimes I think it's done
intentionally (stall tactic).
2. No, because I don't think it's necessary -- when I can clearly "feel"
the slowdown via SSH (which is TCP-based) the issue isn't related to
ICMP prio.  Plus, showing a network provider hping results doesn't
necessarily convince them of anything if they're unfamiliar with the
tool.  That's been my experience anyway.  It'd be akin to giving them
packet captures and doing a 10-page write-up showing how TCP packet with
PSH+ACK seq no 123456789 wasn't seen by the remote end until 2-3
retries.
3. No I haven't, because the process would be significantly more
convoluted than that.  This is what would have to happen, starting with
the forward path:
- I'd have to open a ticket with Comcast through standard 800-COMCAST
  means, i.e. complaining to someone in the Philippines about packet
  loss (read: likelihood of someone screwing this up: 99% likely)
- Comcast would have to hand it off to the Comcast NOC
- Comcast NOC would have to care enough to open a ticket with AT&T
For the reverse path:
- I'd have to open a ticket with my VPS provider, RootBSD
- RootBSD would have to open a ticket with InterNAP (assuming they have
  relationship with them directly; it may be more convoluted, for
  example it may be they have to open a ticket with their co-lo
  provider who then opens a ticket with InterNAP)
- InterNAP would have to care enough to open a ticket with
  Qwest/Centurylink
- Qwest/Centurylink would have to care enough to open a ticket with
  Comcast
Historically I've mailed things of this nature to outages@outages.org
because there are lurkers on the list who quietly go behind the scenes
and start trying to fix/rectify things.  Other times it's purely about
bringing to light something that's happening on the Internet in hopes
that one or more of the involved peers are, in a roundabout way,
publicly shamed for not having better monitoring.
P.S. -- Issue is still ongoing and appears worse than before (at least
now there aren't sporadic times of 0% loss at intermediary hops).
--
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |
On Sat, Mar 08, 2014 at 06:27:15PM -0800, Michael Smith wrote:
...
A couple of things
- Hop 8's IP is an AT&T so likely an interface on an AT&T router, since you're headed towards it in your traceroute (next_hop).
- Have you tried something like hping that will allow you to use TCP for your test?
- Have you contacted InterNAP and told them to open a ticket with AT&T to open a ticket with AT&T using the data you have?
Mike
On Mar 8, 2014, at 4:53 PM, Jeremy Chadwick <jdc@koitsu.org> wrote:
...
Since roughly Friday, I've been seeing what appears to be packet loss
somewhere within Comcast/AT&T network mesh.  Source and destination IPs
are provided below as well, ditto with some mtrs from src->dst and
dst->src.  I keep periodic mtrs (both directions) going all the way back
to 03/04.  I can make all of those logs available if asked.
The issue started on 03/07 @ 21:33 PST suddenly -- not a "gradual"
increase -- and lasted until an undetermined time (very hard to tell
from mtrs) but I'd estimate ~02:00 PST on 03/08 (today).
The issue then appeared to start back up again ~07:00 PST, though it's
hard to give an exact time (seems sort of a gradual increase, thus hard
to pinpoint).  It's been ongoing since.
The loss varies from 3% to 20%, but you can definitely "feel" it across
an SSH session, so it's not ICMP prio.
I will make myself clear: it's very hard to "show" someone the way this
problem manifests itself, because the packet loss will vary all over the
place between different hops.  It *definitely* starts at a particular
point and "trickles down", but due to the fact that the loss is a
smaller percentage, there are times where a hop will suddenly show 0%.
TL;DR -- You'd really have to see a longer log (say, an hour's worth) to
be able to say "ah yes, this really is a problem" and not blow it off as
ICMP prio.
And as usual, there's one of those "mystery routers" (hop #8 in the
first example) that peering providers looooooove to use as a scapegoat
when it comes to shifting blame, ex. provider A says "that's a device
owned by provider B", provider B says "that device is provider A's
responsibility", and neither side does anything about the issue.
However I should note that the "mystery router" usually does show some
degree of loss even when this issue isn't occurring (likely ICMP prio on
the device), but that makes it even more difficult to determine where
the issue begins.
src IP: 76.102.14.35   (Comcast; Mountain View, CA)
dst IP: 204.109.61.174 (RootBSD; Dallas, TX)
=== Sat Mar  8 16:22:00 PST 2014  (1394324520)
Start: Sat Mar  8 16:22:00 2014
HOST: icarus.home.lan                                                 Loss%   Snt   Rcv  Last   Avg  Best  Wrst
 1.|-- gw.home.lan (192.168.1.1)                                        0.0%    30    30   0.4   0.3   0.2   0.4
 2.|-- 76.102.12.1                                                      0.0%    30    30   8.0   8.8   8.0  12.2
 3.|-- te-0-2-0-5-ur06.santaclara.ca.sfba.comcast.net (68.86.249.253)   0.0%    30    30   8.2   9.0   8.2  16.5
 4.|-- te-1-1-0-1-ar01.oakland.ca.sfba.comcast.net (69.139.198.94)      0.0%    30    30  11.9  12.2  10.1  15.0
 5.|-- be-90-ar01.sfsutro.ca.sfba.comcast.net (68.85.155.14)            0.0%    30    30  12.0  12.5  10.1  15.1
 6.|-- he-3-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.94.85)       0.0%    30    30  13.0  14.0  11.8  18.0
 7.|-- pos-0-3-0-0-pe01.11greatoaks.ca.ibone.comcast.net (68.86.87.18)  0.0%    30    30  15.6  17.2  15.3  19.8
 8.|-- 192.205.37.1                                                    70.0%    30     9  54.8  67.7  53.6 102.2
 9.|-- cr2.sffca.ip.att.net (12.122.86.202)                            13.3%    30    26  65.2  63.5  61.0  65.7
10.|-- cr2.la2ca.ip.att.net (12.122.31.133)                             6.7%    30    28  63.3  63.5  60.9  75.2
11.|-- cr2.dlstx.ip.att.net (12.122.28.177)                             3.3%    30    29  65.2  63.7  61.1  69.7
12.|-- ggr6.dlstx.ip.att.net (12.122.138.113)                           6.7%    30    28  60.3  64.5  59.9 153.6
13.|-- 12.90.228.14                                                     6.7%    30    28  60.3  60.7  60.2  62.5
14.|-- border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19)               3.3%    30    29  60.3  60.2  59.8  60.5
15.|-- giglinx-60.border1.dal004.pnap.net (216.52.189.46)               3.3%    30    29  59.9  60.2  59.8  61.4
16.|-- 204.109.62.46                                                    6.7%    30    28  60.1  60.5  60.1  62.7
17.|-- mambo.koitsu.org (204.109.61.174)                                3.3%    30    29  60.7  60.9  60.1  63.4
=== END
src IP: 204.109.61.174 (RootBSD; Dallas, TX)
dst IP: 76.102.14.35   (Comcast; Mountain View, CA)
=== Sat Mar  8 16:22:00 PST 2014  (1394324520)
Start: Sat Mar  8 16:22:00 2014
HOST: mambo.koitsu.org                                                Loss%   Snt   Rcv  Last   Avg  Best  Wrst
 1.|-- 204.109.61.173                                                   0.0%    30    30   0.5   1.3   0.4  15.0
 2.|-- 204.109.62.45                                                    0.0%    30    30   0.5   0.5   0.3   1.2
 3.|-- border1.ge1-6.giglinx-60.dal004.pnap.net (216.52.189.45)         0.0%    30    30   0.5   0.6   0.4   4.7
 4.|-- core3.pc1-bbnet1.ext1a.dal.pnap.net (216.52.191.41)              0.0%    30    30   0.9   1.0   0.9   1.2
 5.|-- dax-edge-03.inet.qwest.net (67.133.189.93)                       0.0%    30    30   0.6   2.0   0.5  22.8
 6.|-- 63-235-82-234.dia.static.qwest.net (63.235.82.234)               0.0%    30    30   1.4   1.3   1.0   1.7
 7.|-- be-13-cr01.dallas.tx.ibone.comcast.net (68.86.82.141)            0.0%    30    30   1.3   2.7   1.0   4.9
 8.|-- he-0-14-0-0-cr01.losangeles.ca.ibone.comcast.net (68.86.85.141)  0.0%    30    30  35.6  33.6  31.8  35.7
 9.|-- he-1-8-0-0-ar01.oakland.ca.sfba.comcast.net (68.86.89.54)        3.3%    30    29  52.8  53.5  51.5  55.5
10.|-- te-0-4-0-5-ur06.santaclara.ca.sfba.comcast.net (68.86.143.97)    0.0%    30    30  52.1  52.2  51.9  52.3
11.|-- te-6-0-acr03.santaclara.ca.sfba.comcast.net (68.86.249.66)       6.7%    30    28  53.0  53.0  52.8  53.8
12.|-- c-76-102-14-35.hsd1.ca.comcast.net (76.102.14.35)                3.3%    30    29  60.4  60.5  59.9  63.8
=== END
--
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |
_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages

Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages

Re: [outages] Comcast<->AT&T packet loss (possibly within CA)

Adam Rothschild