SF South Bay: chronic latency/packet loss between Abovenet/Comcast at Great Oaks

There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration). Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well. Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job). I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me. I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine. The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends. http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la... Anyway, all the technical details aside: Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion? An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. I'd love for Abovenet to get involved, but I have no contractual obligation to them. (If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |

Hi Jeremy, When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream? Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast. On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration).
Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well.
Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job).
I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me.
I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine.
The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends.
http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la...
Anyway, all the technical details aside:
Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion?
An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. I'd love for Abovenet to get involved, but I have no contractual obligation to them. (If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade).
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Hi Ren, The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge. It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others). I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. There still seems to be a problem between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems. This is what's presently happening right now: Source IP: 67.180.84.87 Dest IP: 72.20.98.124 === Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3 === END Source IP: 72.20.98.124 Dest IP: 67.180.84.87 === Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9 === END $ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net. $ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net. So Ren, if you can investigate this, I would be appreciative of it. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
Hi Jeremy,
When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream?
Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast.
On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration).
Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well.
Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. ?Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job).
I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me.
I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). ?My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine.
The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. ?I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends.
http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la...
Anyway, all the technical details aside:
Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion?
An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. ?I'd love for Abovenet to get involved, but I have no contractual obligation to them. ?(If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade).
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples. Feel free to put me in my place, but please do so on -discuss. On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
Hi Ren,
The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge.
It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others).
I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. There still seems to be a problem between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems.
This is what's presently happening right now:
Source IP: 67.180.84.87 Dest IP: 72.20.98.124
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3 === END
Source IP: 72.20.98.124 Dest IP: 67.180.84.87
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9 === END
$ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
$ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
So Ren, if you can investigate this, I would be appreciative of it.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
Hi Jeremy,
When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream?
Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast.
On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration).
Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well.
Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. ?Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job).
I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me.
I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). ?My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine.
The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. ?I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends.
http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la...
Anyway, all the technical details aside:
Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion?
An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. ?I'd love for Abovenet to get involved, but I have no contractual obligation to them. ?(If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade).
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

I choose not to do DNS resolution in mtr because otherwise the terminal width required to see FQDNs has to be >76 characters, which often upsets mailing list folks. (This for example is one of the few lists on which I top-post) I'm not so concerned with the packet loss -- for example in the 2nd mtr set I showed, the loss only seems to happen at routers, which is almost certainly the result of ICMP prioritisation. But the latency is a definite problem and is easily noticeable across SSH, Remote Desktop, and any other TCP service (i.e. the latency shown is not a result of ICMP prioritisation). src/dst IPs on both sides are actual servers/boxes, not routers. But you're absolutely right -- asymmetric routing is in place here, which means that everyone has to work together, and simultaneously, to really figure out where the problem is. I can only do so much when I have little to no visibility into things (e.g. if I had access to BAIS and Abovenet and Level 3 and Comcast routers I could figure out where the problem is... ;-) ) I'm currently engaged in a conversation with Comcast engineers about this issue. (Seems my DSLR post got proper attention) So far the statement is that they've looked at the interface for the Abovenet/Comcast peering point in question, and although it's being used/busy, it's not oversaturated. They also pointed out that the only announcements they see for 72.20.96.0/19 are via Level 3 and Cogent, thus the issue is likely to be on my co-lo providers' side (e.g. the Level 3 <-> BAIS link). route-views also confirms the same thing, as does my place of work (who has peering with Abovenet natively). I have a ticket open with my co-lo provider to investigate this ordeal. If this does turn out to be a problem with their Level 3 link being saturated chronically, then I owe Comcast/Abovenet an apology (welcome to one of the complexities with asymmetric routing!), and I'm going to have to make some decisions with regards to co-location and so on, because the chronic nature of this problem is unacceptable for myself as well as my customers. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
Feel free to put me in my place, but please do so on -discuss.
On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
Hi Ren,
The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge.
It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others).
I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. There still seems to be a problem between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems.
This is what's presently happening right now:
Source IP: 67.180.84.87 Dest IP: 72.20.98.124
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3 === END
Source IP: 72.20.98.124 Dest IP: 67.180.84.87
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9 === END
$ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
$ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
So Ren, if you can investigate this, I would be appreciative of it.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
Hi Jeremy,
When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream?
Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast.
On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration).
Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well.
Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. ?Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job).
I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me.
I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). ?My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine.
The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. ?I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends.
http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la...
Anyway, all the technical details aside:
Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion?
An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. ?I'd love for Abovenet to get involved, but I have no contractual obligation to them. ?(If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade).
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Following up to my own post (in bad habit): Can some folks here who have peering with Abovenet (preferably with a full routing table) verify that you see an announcement for 72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)? I've confirmed this is the case at my workplace, but I want extra eyes/verification. Thanks. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
I choose not to do DNS resolution in mtr because otherwise the terminal width required to see FQDNs has to be >76 characters, which often upsets mailing list folks. (This for example is one of the few lists on which I top-post)
I'm not so concerned with the packet loss -- for example in the 2nd mtr set I showed, the loss only seems to happen at routers, which is almost certainly the result of ICMP prioritisation.
But the latency is a definite problem and is easily noticeable across SSH, Remote Desktop, and any other TCP service (i.e. the latency shown is not a result of ICMP prioritisation). src/dst IPs on both sides are actual servers/boxes, not routers.
But you're absolutely right -- asymmetric routing is in place here, which means that everyone has to work together, and simultaneously, to really figure out where the problem is. I can only do so much when I have little to no visibility into things (e.g. if I had access to BAIS and Abovenet and Level 3 and Comcast routers I could figure out where the problem is... ;-) )
I'm currently engaged in a conversation with Comcast engineers about this issue. (Seems my DSLR post got proper attention)
So far the statement is that they've looked at the interface for the Abovenet/Comcast peering point in question, and although it's being used/busy, it's not oversaturated. They also pointed out that the only announcements they see for 72.20.96.0/19 are via Level 3 and Cogent, thus the issue is likely to be on my co-lo providers' side (e.g. the Level 3 <-> BAIS link). route-views also confirms the same thing, as does my place of work (who has peering with Abovenet natively).
I have a ticket open with my co-lo provider to investigate this ordeal.
If this does turn out to be a problem with their Level 3 link being saturated chronically, then I owe Comcast/Abovenet an apology (welcome to one of the complexities with asymmetric routing!), and I'm going to have to make some decisions with regards to co-location and so on, because the chronic nature of this problem is unacceptable for myself as well as my customers.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
Feel free to put me in my place, but please do so on -discuss.
On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
Hi Ren,
The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge.
It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others).
I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. There still seems to be a problem between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems.
This is what's presently happening right now:
Source IP: 67.180.84.87 Dest IP: 72.20.98.124
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3 === END
Source IP: 72.20.98.124 Dest IP: 67.180.84.87
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9 === END
$ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
$ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
So Ren, if you can investigate this, I would be appreciative of it.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
Hi Jeremy,
When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream?
Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast.
On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration).
Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well.
Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. ?Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job).
I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me.
I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). ?My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine.
The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. ?I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends.
http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la...
Anyway, all the technical details aside:
Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion?
An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. ?I'd love for Abovenet to get involved, but I have no contractual obligation to them. ?(If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade).
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

I guess there's no need for anyone to do this. I completely forgot that Abovenet has a looking glass. They absolutely see a route announcement for 72.20.96.0/19 from AS7151, including from mpr4.sjc7.us.above.net (keep reading): Per http://lg.above.net/lg.cgi -- Router: mpr4.sjc7.us.above.net Command: show route protocol bgp table inet.0 72.20.96.0/19 terse exact inet.0: 404034 destinations, 2133715 routes (403943 active, 108 holddown, 1638 hidden) Restart Complete + = Active Route, - = Last Active, * = Both A Destination P Prf Metric 1 Metric 2 Next hop AS path * 72.20.96.0/19 B 170 200 0 >64.125.27.94 7151 I 64.125.27.85 Peering point confirmation (egress traceroute run from 72.20.98.124 destined to 67.180.84.87): traceroute to 67.180.84.87 (67.180.84.87), 64 hops max, 52 byte packets 1 72.20.98.65 (72.20.98.65) 0.354 ms 0.232 ms 0.362 ms 2 er1sc2.bayarea.net (69.163.64.44) 0.363 ms 0.258 ms 0.243 ms 3 er2sc2.bayarea.net (69.163.65.49) 0.489 ms 0.438 ms * 4 xe-7-1-0.er1.sjc2.above.net (64.124.65.93) 0.527 ms 0.476 ms 0.488 ms 5 xe-4-0-0.cr1.sjc2.us.above.net (64.125.28.54) 1.650 ms 0.711 ms 1.087 ms 6 xe-0-0-0.cr2.sjc2.us.above.net (64.125.30.126) 0.879 ms 0.876 ms 0.735 ms 7 xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 1.156 ms 1.104 ms 1.121 ms 8 be-10-403-pe01.11greatoaks.ca.ibone.comcast.net (75.149.228.133) 7.601 ms 11.717 ms 11.968 ms 9 pos-2-1-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.65) 6.686 ms 3.310 ms 3.851 ms 10 pos-0-14-0-0-ar01.sfsutro.ca.sfba.comcast.net (68.86.90.158) 8.716 ms 7.414 ms 8.343 ms 11 te-9-8-ur03.santaclara.ca.sfba.comcast.net (68.86.143.93) 5.722 ms 5.975 ms 5.698 ms 12 68.85.191.250 (68.85.191.250) 10.868 ms 13.714 ms 7.968 ms 13 c-67-180-84-87.hsd1.ca.comcast.net (67.180.84.87) 16.969 ms 48.121 ms 15.588 ms So what Comcast's "backbone team" told me appears to be incorrect (we're all human), or there are route filters being applied, or they don't get a full routing table from Abovenet -- unknown which. I'm still talking to them about that, but probably won't get an answer until later tomorrow. I still have a ticket open with my co-lo provider to investigate the Level 3 link they have. That's just as much of a possibility of an saturation point as the Abovenet/Comcast link is. Abovenet's LG also offers ping capability, so I should be able to use that as a way to narrow down/confirm if the problem is there or with the Level 3<->BAIS link. Will find out tomorrow... -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Tue, Apr 10, 2012 at 10:35:31PM -0700, Jeremy Chadwick wrote:
Following up to my own post (in bad habit):
Can some folks here who have peering with Abovenet (preferably with a full routing table) verify that you see an announcement for 72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?
I've confirmed this is the case at my workplace, but I want extra eyes/verification.
Thanks.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
I choose not to do DNS resolution in mtr because otherwise the terminal width required to see FQDNs has to be >76 characters, which often upsets mailing list folks. (This for example is one of the few lists on which I top-post)
I'm not so concerned with the packet loss -- for example in the 2nd mtr set I showed, the loss only seems to happen at routers, which is almost certainly the result of ICMP prioritisation.
But the latency is a definite problem and is easily noticeable across SSH, Remote Desktop, and any other TCP service (i.e. the latency shown is not a result of ICMP prioritisation). src/dst IPs on both sides are actual servers/boxes, not routers.
But you're absolutely right -- asymmetric routing is in place here, which means that everyone has to work together, and simultaneously, to really figure out where the problem is. I can only do so much when I have little to no visibility into things (e.g. if I had access to BAIS and Abovenet and Level 3 and Comcast routers I could figure out where the problem is... ;-) )
I'm currently engaged in a conversation with Comcast engineers about this issue. (Seems my DSLR post got proper attention)
So far the statement is that they've looked at the interface for the Abovenet/Comcast peering point in question, and although it's being used/busy, it's not oversaturated. They also pointed out that the only announcements they see for 72.20.96.0/19 are via Level 3 and Cogent, thus the issue is likely to be on my co-lo providers' side (e.g. the Level 3 <-> BAIS link). route-views also confirms the same thing, as does my place of work (who has peering with Abovenet natively).
I have a ticket open with my co-lo provider to investigate this ordeal.
If this does turn out to be a problem with their Level 3 link being saturated chronically, then I owe Comcast/Abovenet an apology (welcome to one of the complexities with asymmetric routing!), and I'm going to have to make some decisions with regards to co-location and so on, because the chronic nature of this problem is unacceptable for myself as well as my customers.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
Feel free to put me in my place, but please do so on -discuss.
On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
Hi Ren,
The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge.
It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others).
I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. There still seems to be a problem between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems.
This is what's presently happening right now:
Source IP: 67.180.84.87 Dest IP: 72.20.98.124
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3 === END
Source IP: 72.20.98.124 Dest IP: 67.180.84.87
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9 === END
$ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
$ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
So Ren, if you can investigate this, I would be appreciative of it.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
Hi Jeremy,
When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream?
Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast.
On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
There's an issue I've been tracking for a few months now pertaining to a network link between Abovenet and Comcast which appears to become saturated (or impacted negatively in some way) at nearly the same time every night, and lasts for numerous hours, then ceases -- on a near-daily basis (no exaggeration).
Latency and packet loss occur during this time, with latency hitting 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. I've been storing periodic traceroutes/mtrs for over a month showing this problem, and been tracking start/end times as well.
Thankfully I own devices/have connectivity on both ends (src and dst, thus can provide mtrs/traceroutes from both directions. ?Analysis so far, done by myself as well as senior network techs at my co-lo provider, confirms this issue is with a link between Abovenet/Comcast, likely within the San Jose Great Oaks POP (which I'm familiar with as part of my job).
I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only Comcast employees can respond to/view tickets for) over a month ago. Someone has been viewing it, but nobody has replied except me.
I've since made the issue public, where (of course) the general Internet community does not quite understand how peering arrangements/contracts work (people think that any company who has a contract with Abovenet can report issues, but that is simply not the case; you must be a POC for the transport to report issues with it), nor do they understand how a co-lo provider changing route preferencing can impact the provider financially (based on billing metrics, etc.). ?My co-lo provider is very strict with their routing policies, and it has to do with financial reasons that are their own business, not mine.
The public thread is here, which also includes start/end times, traceroutes (both directions), and so on. ?I update it every day when the issue happens, and ~90% of the time edit my posts when the issue ends.
http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la...
Anyway, all the technical details aside:
Is there anyone on this list who works for Comcast who can contact me off-list who is willing to investigate this and drive it to completion?
An alternative would be for someone to contact me off-list with the name or Email address of someone (or division) who handles issues like this at Comcast. ?I'd love for Abovenet to get involved, but I have no contractual obligation to them. ?(If there is an Abovenet individual who is willing to investigate this "pro bono" per se, that would be awesome, but I imagine such is often above one's pay grade).
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

On Wed, Apr 11, 2012 at 3:12 AM, Jeremy Chadwick <outages@jdc.parodius.com>wrote:
I guess there's no need for anyone to do this. I completely forgot that Abovenet has a looking glass.
They absolutely see a route announcement for 72.20.96.0/19 from AS7151, including from mpr4.sjc7.us.above.net (keep reading):
Their web looking glass isn't enough for this purpose. You need to see at least - what communities are on the route - what the outbound policy is on the session - what the inbound policy is on the other side All you have here is 'route in table'. I still have a ticket open with my co-lo provider to investigate the
Level 3 link they have.
This is what you should have done in the first place; deal with the people you pay for service and push them to do their job. CC

Jeremy, That your collocation provider is a) load-balancing by hashing traffic between two disparate transit ASes (with very different interprovider connectivity, performance characteristics, ...), on a per-IP basis b) running its transit ports hot, by its own admission... does not inspire confidence in its ability to deliver reliable service. At this juncture, it seems like we've sufficiently beaten this horse and established that there are no known issues between the backbone providers you've called out in this thread, from the information provided to date. The good news about the Bay Area is that it's a competitive marketplace, with no shortage of competent providers selling collocation and IP. Probably off-topic for discussion on this list, though I'd be happy to recommend some offline if you're coming up short. Hope this helps, -a On Wed, Apr 11, 2012 at 3:12 AM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
I guess there's no need for anyone to do this. I completely forgot that Abovenet has a looking glass.
They absolutely see a route announcement for 72.20.96.0/19 from AS7151, including from mpr4.sjc7.us.above.net (keep reading):
Per http://lg.above.net/lg.cgi --
Router: mpr4.sjc7.us.above.net Command: show route protocol bgp table inet.0 72.20.96.0/19 terse exact
inet.0: 404034 destinations, 2133715 routes (403943 active, 108 holddown, 1638 hidden) Restart Complete + = Active Route, - = Last Active, * = Both
A Destination P Prf Metric 1 Metric 2 Next hop AS path * 72.20.96.0/19 B 170 200 0 >64.125.27.94 7151 I 64.125.27.85
Peering point confirmation (egress traceroute run from 72.20.98.124 destined to 67.180.84.87):
traceroute to 67.180.84.87 (67.180.84.87), 64 hops max, 52 byte packets 1 72.20.98.65 (72.20.98.65) 0.354 ms 0.232 ms 0.362 ms 2 er1sc2.bayarea.net (69.163.64.44) 0.363 ms 0.258 ms 0.243 ms 3 er2sc2.bayarea.net (69.163.65.49) 0.489 ms 0.438 ms * 4 xe-7-1-0.er1.sjc2.above.net (64.124.65.93) 0.527 ms 0.476 ms 0.488 ms 5 xe-4-0-0.cr1.sjc2.us.above.net (64.125.28.54) 1.650 ms 0.711 ms 1.087 ms 6 xe-0-0-0.cr2.sjc2.us.above.net (64.125.30.126) 0.879 ms 0.876 ms 0.735 ms 7 xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) 1.156 ms 1.104 ms 1.121 ms 8 be-10-403-pe01.11greatoaks.ca.ibone.comcast.net (75.149.228.133) 7.601 ms 11.717 ms 11.968 ms 9 pos-2-1-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.65) 6.686 ms 3.310 ms 3.851 ms 10 pos-0-14-0-0-ar01.sfsutro.ca.sfba.comcast.net (68.86.90.158) 8.716 ms 7.414 ms 8.343 ms 11 te-9-8-ur03.santaclara.ca.sfba.comcast.net (68.86.143.93) 5.722 ms 5.975 ms 5.698 ms 12 68.85.191.250 (68.85.191.250) 10.868 ms 13.714 ms 7.968 ms 13 c-67-180-84-87.hsd1.ca.comcast.net (67.180.84.87) 16.969 ms 48.121 ms 15.588 ms
So what Comcast's "backbone team" told me appears to be incorrect (we're all human), or there are route filters being applied, or they don't get a full routing table from Abovenet -- unknown which. I'm still talking to them about that, but probably won't get an answer until later tomorrow.
I still have a ticket open with my co-lo provider to investigate the Level 3 link they have. That's just as much of a possibility of an saturation point as the Abovenet/Comcast link is.
Abovenet's LG also offers ping capability, so I should be able to use that as a way to narrow down/confirm if the problem is there or with the Level 3<->BAIS link. Will find out tomorrow...
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 10:35:31PM -0700, Jeremy Chadwick wrote:
Following up to my own post (in bad habit):
Can some folks here who have peering with Abovenet (preferably with a full routing table) verify that you see an announcement for 72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?
I've confirmed this is the case at my workplace, but I want extra eyes/verification.
Thanks.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
I choose not to do DNS resolution in mtr because otherwise the terminal width required to see FQDNs has to be >76 characters, which often upsets mailing list folks. (This for example is one of the few lists on which I top-post)
I'm not so concerned with the packet loss -- for example in the 2nd mtr set I showed, the loss only seems to happen at routers, which is almost certainly the result of ICMP prioritisation.
But the latency is a definite problem and is easily noticeable across SSH, Remote Desktop, and any other TCP service (i.e. the latency shown is not a result of ICMP prioritisation). src/dst IPs on both sides are actual servers/boxes, not routers.
But you're absolutely right -- asymmetric routing is in place here, which means that everyone has to work together, and simultaneously, to really figure out where the problem is. I can only do so much when I have little to no visibility into things (e.g. if I had access to BAIS and Abovenet and Level 3 and Comcast routers I could figure out where the problem is... ;-) )
I'm currently engaged in a conversation with Comcast engineers about this issue. (Seems my DSLR post got proper attention)
So far the statement is that they've looked at the interface for the Abovenet/Comcast peering point in question, and although it's being used/busy, it's not oversaturated. They also pointed out that the only announcements they see for 72.20.96.0/19 are via Level 3 and Cogent, thus the issue is likely to be on my co-lo providers' side (e.g. the Level 3 <-> BAIS link). route-views also confirms the same thing, as does my place of work (who has peering with Abovenet natively).
I have a ticket open with my co-lo provider to investigate this ordeal.
If this does turn out to be a problem with their Level 3 link being saturated chronically, then I owe Comcast/Abovenet an apology (welcome to one of the complexities with asymmetric routing!), and I'm going to have to make some decisions with regards to co-location and so on, because the chronic nature of this problem is unacceptable for myself as well as my customers.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
Feel free to put me in my place, but please do so on -discuss.
On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
Hi Ren,
The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge.
It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others).
I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. There still seems to be a problem between Abovenet/Comcast. Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems.
This is what's presently happening right now:
Source IP: 67.180.84.87 Dest IP: 72.20.98.124
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: icarus.home.lan Loss% Snt Rcv Last Avg Best Wrst 1.|-- 192.168.1.1 0.0% 40 40 0.3 0.6 0.2 1.5 2.|-- 67.180.84.1 0.0% 40 40 24.5 22.7 10.4 54.0 3.|-- 68.85.191.253 0.0% 40 40 10.2 11.1 8.4 25.5 4.|-- 68.86.143.98 0.0% 40 40 15.6 16.4 11.1 34.7 5.|-- 68.86.91.5 0.0% 40 40 14.3 18.7 12.4 49.7 6.|-- 68.86.87.182 0.0% 40 40 17.1 19.4 14.4 51.6 7.|-- 4.71.118.45 0.0% 40 40 14.3 23.7 13.0 77.9 8.|-- 4.69.152.148 0.0% 40 40 67.5 27.1 13.3 128.0 9.|-- 4.53.16.18 5.0% 40 38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 2.5% 40 39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 5.0% 40 38 205.3 152.0 129.8 205.3 === END
Source IP: 72.20.98.124 Dest IP: 67.180.84.87
=== Tue Apr 10 19:09:00 PDT 2012 (1334110140) HOST: isis.parodius.com Loss% Snt Rcv Last Avg Best Wrst 1.|-- 72.20.98.65 0.0% 41 41 0.4 0.4 0.3 0.6 2.|-- 69.163.64.44 0.0% 40 40 0.4 0.4 0.3 0.5 3.|-- 69.163.65.49 0.0% 40 40 0.6 10.6 0.4 76.8 4.|-- 64.124.65.93 0.0% 40 40 65.6 3.6 0.4 65.6 5.|-- 64.125.28.54 0.0% 40 40 2.8 4.2 0.7 51.7 6.|-- 64.125.30.126 0.0% 40 40 0.8 1.4 0.7 16.7 7.|-- 64.125.30.178 0.0% 40 40 1.1 5.8 1.1 65.5 8.|-- 75.149.228.133 0.0% 40 40 148.7 136.9 117.4 150.2 9.|-- 68.86.85.65 5.0% 40 38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 2.5% 40 39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 2.5% 40 39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 0.0% 40 40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 0.0% 40 40 146.7 149.6 132.2 173.9 === END
$ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
$ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
So Ren, if you can investigate this, I would be appreciative of it.
-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote:
Hi Jeremy,
When the issue was raised a week or two ago there seemed to be a route announcement issue for 72.20.98.67. When your colo provider changed their policy did they update filters with their upstream?
Cheers, -ren, who will confirm there is no congestion with Abovenet on the port in SJC to Comcast.
On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick <outages@jdc.parodius.com> wrote: > There's an issue I've been tracking for a few months now pertaining to a > network link between Abovenet and Comcast which appears to become > saturated (or impacted negatively in some way) at nearly the same time > every night, and lasts for numerous hours, then ceases -- on a > near-daily basis (no exaggeration). > > Latency and packet loss occur during this time, with latency hitting > 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. > I've been storing periodic traceroutes/mtrs for over a month showing > this problem, and been tracking start/end times as well. > > Thankfully I own devices/have connectivity on both ends (src and dst, > thus can provide mtrs/traceroutes from both directions. ?Analysis so > far, done by myself as well as senior network techs at my co-lo > provider, confirms this issue is with a link between Abovenet/Comcast, > likely within the San Jose Great Oaks POP (which I'm familiar with as > part of my job). > > I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only > Comcast employees can respond to/view tickets for) over a month ago. > Someone has been viewing it, but nobody has replied except me. > > I've since made the issue public, where (of course) the general Internet > community does not quite understand how peering arrangements/contracts > work (people think that any company who has a contract with Abovenet can > report issues, but that is simply not the case; you must be a POC for > the transport to report issues with it), nor do they understand how a > co-lo provider changing route preferencing can impact the provider > financially (based on billing metrics, etc.). ?My co-lo provider is very > strict with their routing policies, and it has to do with financial > reasons that are their own business, not mine. > > The public thread is here, which also includes start/end times, > traceroutes (both directions), and so on. ?I update it every day when > the issue happens, and ~90% of the time edit my posts when the issue > ends. > > http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la... > > Anyway, all the technical details aside: > > Is there anyone on this list who works for Comcast who can contact me > off-list who is willing to investigate this and drive it to completion? > > An alternative would be for someone to contact me off-list with the name > or Email address of someone (or division) who handles issues like this > at Comcast. ?I'd love for Abovenet to get involved, but I have no > contractual obligation to them. ?(If there is an Abovenet individual who > is willing to investigate this "pro bono" per se, that would be > awesome, but I imagine such is often above one's pay grade). > > -- > | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | > | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | > | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | > | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB | > > _______________________________________________ > Outages mailing list > Outages@outages.org > https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

On Wed, Apr 11, 2012 at 01:38:49PM -0400, Adam Rothschild wrote:
That your collocation provider is a) load-balancing by hashing traffic between two disparate transit ASes (with very different interprovider connectivity, performance characteristics, ...), on a per-IP basis b) running its transit ports hot, by its own admission... does not inspire confidence in its ability to deliver reliable service.
At this juncture, it seems like we've sufficiently beaten this horse and established that there are no known issues between the backbone providers you've called out in this thread, from the information provided to date.
The good news about the Bay Area is that it's a competitive marketplace, with no shortage of competent providers selling collocation and IP. Probably off-topic for discussion on this list, though I'd be happy to recommend some offline if you're coming up short.
Adam, Sorry, but it doesn't help. The recurring latency I've reported is quite real. As I said, I have months of mtrs/traceroutes from both directions showing this problem. Asymmetric routing does not/can not explain the chronic high latency seen at roughly the same times nearly every day. I wish it was ICMP prioritisation (I really do). I have done as much work as I can on documenting the recurring nature of the problem, where its seen (either between Level 3 and AS7151, or between Abovenet and Comcast), when it starts, and when it ends. As I am the customer on *both ends* (src and dst), the fact that I'm getting no where is preposterous. Given that I do not have access to Abovenet, Comcast, Level 3, or my co-lo providers' routers, I'm forced to rely on the competency of others. Nobody has "beaten this horse" -- the horse is still there, blocking the road, its corpse rotting and festering, affecting network traffic. Who hauls it into the road every day from roughly 17:00 to 21:00 PDT is unknown. Instead, all that's happened is folks focusing on the asymmetric aspect of routing, and with my co-lo provider's choice to siphon certain IPs through certain routes. As for the Bay Area having "no shortage of competent providers selling co-location": let me know when that happens. All I've seen so far is complete and total incompetence on the parts of co-lo providers (not only our current but our previous as well), transit and peering providers, and many other divisions. Honest: do not get me started on this. Please do not. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Wed, Apr 11, 2012 at 3:12 AM, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
I guess there's no need for anyone to do this. ?I completely forgot that Abovenet has a looking glass.
They absolutely see a route announcement for 72.20.96.0/19 from AS7151, including from mpr4.sjc7.us.above.net (keep reading):
Per http://lg.above.net/lg.cgi --
Router: mpr4.sjc7.us.above.net Command: show route protocol bgp table inet.0 72.20.96.0/19 terse exact
inet.0: 404034 destinations, 2133715 routes (403943 active, 108 holddown, 1638 hidden) Restart Complete + = Active Route, - = Last Active, * = Both
A Destination ? ? ? ?P Prf ? Metric 1 ? Metric 2 ?Next hop ? ? ? ?AS path * 72.20.96.0/19 ? ? ?B 170 ? ? ? ?200 ? ? ? ? ?0 >64.125.27.94 ? ?7151 I ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?64.125.27.85
Peering point confirmation (egress traceroute run from 72.20.98.124 destined to 67.180.84.87):
traceroute to 67.180.84.87 (67.180.84.87), 64 hops max, 52 byte packets ?1 ?72.20.98.65 (72.20.98.65) ?0.354 ms ?0.232 ms ?0.362 ms ?2 ?er1sc2.bayarea.net (69.163.64.44) ?0.363 ms ?0.258 ms ?0.243 ms ?3 ?er2sc2.bayarea.net (69.163.65.49) ?0.489 ms ?0.438 ms * ?4 ?xe-7-1-0.er1.sjc2.above.net (64.124.65.93) ?0.527 ms ?0.476 ms ?0.488 ms ?5 ?xe-4-0-0.cr1.sjc2.us.above.net (64.125.28.54) ?1.650 ms ?0.711 ms ?1.087 ms ?6 ?xe-0-0-0.cr2.sjc2.us.above.net (64.125.30.126) ?0.879 ms ?0.876 ms ?0.735 ms ?7 ?xe-0-1-0.mpr4.sjc7.us.above.net (64.125.30.178) ?1.156 ms ?1.104 ms ?1.121 ms ?8 ?be-10-403-pe01.11greatoaks.ca.ibone.comcast.net (75.149.228.133) ?7.601 ms ?11.717 ms ?11.968 ms ?9 ?pos-2-1-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.65) ?6.686 ms ?3.310 ms ?3.851 ms 10 ?pos-0-14-0-0-ar01.sfsutro.ca.sfba.comcast.net (68.86.90.158) ?8.716 ms ?7.414 ms ?8.343 ms 11 ?te-9-8-ur03.santaclara.ca.sfba.comcast.net (68.86.143.93) ?5.722 ms ?5.975 ms ?5.698 ms 12 ?68.85.191.250 (68.85.191.250) ?10.868 ms ?13.714 ms ?7.968 ms 13 ?c-67-180-84-87.hsd1.ca.comcast.net (67.180.84.87) ?16.969 ms ?48.121 ms ?15.588 ms
So what Comcast's "backbone team" told me appears to be incorrect (we're all human), or there are route filters being applied, or they don't get a full routing table from Abovenet -- unknown which. ?I'm still talking to them about that, but probably won't get an answer until later tomorrow.
I still have a ticket open with my co-lo provider to investigate the Level 3 link they have. ?That's just as much of a possibility of an saturation point as the Abovenet/Comcast link is.
Abovenet's LG also offers ping capability, so I should be able to use that as a way to narrow down/confirm if the problem is there or with the Level 3<->BAIS link. ?Will find out tomorrow...
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 10:35:31PM -0700, Jeremy Chadwick wrote:
Following up to my own post (in bad habit):
Can some folks here who have peering with Abovenet (preferably with a full routing table) verify that you see an announcement for 72.20.96.0/19 (AS7151) coming via AS6461 (Abovenet)?
I've confirmed this is the case at my workplace, but I want extra eyes/verification.
Thanks.
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:33:04PM -0700, Jeremy Chadwick wrote:
I choose not to do DNS resolution in mtr because otherwise the terminal width required to see FQDNs has to be >76 characters, which often upsets mailing list folks. ?(This for example is one of the few lists on which I top-post)
I'm not so concerned with the packet loss -- for example in the 2nd mtr set I showed, the loss only seems to happen at routers, which is almost certainly the result of ICMP prioritisation.
But the latency is a definite problem and is easily noticeable across SSH, Remote Desktop, and any other TCP service (i.e. the latency shown is not a result of ICMP prioritisation). ?src/dst IPs on both sides are actual servers/boxes, not routers.
But you're absolutely right -- asymmetric routing is in place here, which means that everyone has to work together, and simultaneously, to really figure out where the problem is. ?I can only do so much when I have little to no visibility into things (e.g. if I had access to BAIS and Abovenet and Level 3 and Comcast routers I could figure out where the problem is... ;-) )
I'm currently engaged in a conversation with Comcast engineers about this issue. ?(Seems my DSLR post got proper attention)
So far the statement is that they've looked at the interface for the Abovenet/Comcast peering point in question, and although it's being used/busy, it's not oversaturated. ?They also pointed out that the only announcements they see for 72.20.96.0/19 are via Level 3 and Cogent, thus the issue is likely to be on my co-lo providers' side (e.g. the Level 3 <-> BAIS link). ?route-views also confirms the same thing, as does my place of work (who has peering with Abovenet natively).
I have a ticket open with my co-lo provider to investigate this ordeal.
If this does turn out to be a problem with their Level 3 link being saturated chronically, then I owe Comcast/Abovenet an apology (welcome to one of the complexities with asymmetric routing!), and I'm going to have to make some decisions with regards to co-location and so on, because the chronic nature of this problem is unacceptable for myself as well as my customers.
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 08:50:00PM -0700, Kevin Blackham wrote:
A little PTR would help see more clearly, but I believe I see asymmetry here. Also interesting is the ultimate hop at 0% loss in one of those samples.
Feel free to put me in my place, but please do so on -discuss.
On Apr 10, 2012, at 19:12, Jeremy Chadwick <outages@jdc.parodius.com> wrote:
Hi Ren,
The issue with my co-lo pertaining to route announcements has actually been "dealt with", meaning "this is just how it is". ?I'm wondering if I can go into details without violating contractual obligations, hmm. Yes, I imagine I can, because it becomes quite obvious if I provide traceroutes from both directions, and that's public knowledge.
It appears that my co-lo (BAIS) doesn't actually adjust route announcements on a per-IP basis, but they internally have a hashing algorithm in place where on a per-IP basis different addresses utilise different network paths. ?I still have an open ticket with their senior networking engineer about this, who has been somewhat "careful" in what he tells me, but so far I've basically gotten confirmation that this is indeed how they do their load-balancing for customers to balance out network traffic between all of their peering providers (Level 3, Abovenet, Cogent, and 2-3 others).
I can provide those examples (to/from different IPs) if you want to see them, but that is a separate matter. ?There still seems to be a problem between Abovenet/Comcast. ?Alternate links/paths through my co-lo (e.g. BAIS/Cogent) show no problems on the ingress or egress path -- the common path seems to be Abovenet/Comcast when there are problems.
This is what's presently happening right now:
Source IP: 67.180.84.87 Dest IP: ? 72.20.98.124
=== Tue Apr 10 19:09:00 PDT 2012 ?(1334110140) HOST: icarus.home.lan ? ? ? ? ? ? Loss% ? Snt ? Rcv ?Last ? Avg ?Best ?Wrst ?1.|-- 192.168.1.1 ? ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ? 0.3 ? 0.6 ? 0.2 ? 1.5 ?2.|-- 67.180.84.1 ? ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ?24.5 ?22.7 ?10.4 ?54.0 ?3.|-- 68.85.191.253 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ?10.2 ?11.1 ? 8.4 ?25.5 ?4.|-- 68.86.143.98 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?15.6 ?16.4 ?11.1 ?34.7 ?5.|-- 68.86.91.5 ? ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?14.3 ?18.7 ?12.4 ?49.7 ?6.|-- 68.86.87.182 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?17.1 ?19.4 ?14.4 ?51.6 ?7.|-- 4.71.118.45 ? ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ?14.3 ?23.7 ?13.0 ?77.9 ?8.|-- 4.69.152.148 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?67.5 ?27.1 ?13.3 128.0 ?9.|-- 4.53.16.18 ? ? ? ? ? ? ? ? 5.0% ? ?40 ? ?38 151.6 153.3 133.3 184.1 10.|-- 69.163.65.39 ? ? ? ? ? ? ? 2.5% ? ?40 ? ?39 176.3 155.8 135.3 198.1 11.|-- 72.20.98.124 ? ? ? ? ? ? ? 5.0% ? ?40 ? ?38 205.3 152.0 129.8 205.3 === END
Source IP: 72.20.98.124 Dest IP: ? 67.180.84.87
=== Tue Apr 10 19:09:00 PDT 2012 ?(1334110140) HOST: isis.parodius.com ? ? ? ? ? Loss% ? Snt ? Rcv ?Last ? Avg ?Best ?Wrst ?1.|-- 72.20.98.65 ? ? ? ? ? ? ? ?0.0% ? ?41 ? ?41 0.4 0.4 0.3 0.6 ?2.|-- 69.163.64.44 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ? 0.4 ? 0.4 ? 0.3 ? 0.5 ?3.|-- 69.163.65.49 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ? 0.6 ?10.6 ? 0.4 ?76.8 ?4.|-- 64.124.65.93 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ?65.6 ? 3.6 ? 0.4 ?65.6 ?5.|-- 64.125.28.54 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 ? 2.8 ? 4.2 ? 0.7 ?51.7 ?6.|-- 64.125.30.126 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ? 0.8 ? 1.4 ? 0.7 ?16.7 ?7.|-- 64.125.30.178 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 ? 1.1 ? 5.8 ? 1.1 ?65.5 ?8.|-- 75.149.228.133 ? ? ? ? ? ? 0.0% ? ?40 ? ?40 148.7 136.9 117.4 150.2 ?9.|-- 68.86.85.65 ? ? ? ? ? ? ? ?5.0% ? ?40 ? ?38 139.3 135.5 119.9 148.3 10.|-- 68.86.90.158 ? ? ? ? ? ? ? 2.5% ? ?40 ? ?39 141.6 138.1 120.0 149.8 11.|-- 68.86.143.93 ? ? ? ? ? ? ? 2.5% ? ?40 ? ?39 140.3 136.8 120.5 149.8 12.|-- 68.85.191.250 ? ? ? ? ? ? ?0.0% ? ?40 ? ?40 150.8 144.8 128.7 159.5 13.|-- 67.180.84.87 ? ? ? ? ? ? ? 0.0% ? ?40 ? ?40 146.7 149.6 132.2 173.9 === END
$ host 64.125.30.178 178.30.125.64.in-addr.arpa domain name pointer xe-0-1-0.mpr4.sjc7.us.above.net.
$ host 75.149.228.133 133.228.149.75.in-addr.arpa domain name pointer be-10-403-pe01.11greatoaks.ca.ibone.comcast.net.
So Ren, if you can investigate this, I would be appreciative of it.
-- | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB |
On Tue, Apr 10, 2012 at 09:55:33PM -0400, Ren Provo wrote: > Hi Jeremy, > > When the issue was raised a week or two ago there seemed to be a route > announcement issue for 72.20.98.67. ?When your colo provider changed > their policy did they update filters with their upstream? > > Cheers, -ren, who will confirm there is no congestion with Abovenet on > the port in SJC to Comcast. > > On Tue, Apr 10, 2012 at 9:44 PM, Jeremy Chadwick > <outages@jdc.parodius.com> wrote: >> There's an issue I've been tracking for a few months now pertaining to a >> network link between Abovenet and Comcast which appears to become >> saturated (or impacted negatively in some way) at nearly the same time >> every night, and lasts for numerous hours, then ceases -- on a >> near-daily basis (no exaggeration). >> >> Latency and packet loss occur during this time, with latency hitting >> 150ms (sometimes higher), with packet loss ranging from 0.5% to 2.0%. >> I've been storing periodic traceroutes/mtrs for over a month showing >> this problem, and been tracking start/end times as well. >> >> Thankfully I own devices/have connectivity on both ends (src and dst, >> thus can provide mtrs/traceroutes from both directions. ?Analysis so >> far, done by myself as well as senior network techs at my co-lo >> provider, confirms this issue is with a link between Abovenet/Comcast, >> likely within the San Jose Great Oaks POP (which I'm familiar with as >> part of my job). >> >> I opened up a ticket on DSLR/BBR's Comcast Direct forum (which only >> Comcast employees can respond to/view tickets for) over a month ago. >> Someone has been viewing it, but nobody has replied except me. >> >> I've since made the issue public, where (of course) the general Internet >> community does not quite understand how peering arrangements/contracts >> work (people think that any company who has a contract with Abovenet can >> report issues, but that is simply not the case; you must be a POC for >> the transport to report issues with it), nor do they understand how a >> co-lo provider changing route preferencing can impact the provider >> financially (based on billing metrics, etc.). ?My co-lo provider is very >> strict with their routing policies, and it has to do with financial >> reasons that are their own business, not mine. >> >> The public thread is here, which also includes start/end times, >> traceroutes (both directions), and so on. ?I update it every day when >> the issue happens, and ~90% of the time edit my posts when the issue >> ends. >> >> http://www.dslreports.com/forum/r27085601-SF-South-Bay-Chronic-IP-network-la... >> >> Anyway, all the technical details aside: >> >> Is there anyone on this list who works for Comcast who can contact me >> off-list who is willing to investigate this and drive it to completion? >> >> An alternative would be for someone to contact me off-list with the name >> or Email address of someone (or division) who handles issues like this >> at Comcast. ?I'd love for Abovenet to get involved, but I have no >> contractual obligation to them. ?(If there is an Abovenet individual who >> is willing to investigate this "pro bono" per se, that would be >> awesome, but I imagine such is often above one's pay grade). >> >> -- >> | Jeremy Chadwick ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jdc at parodius.com | >> | Parodius Networking ? ? ? ? ? ? ? ? ? ? http://www.parodius.com/ | >> | UNIX Systems Administrator ? ? ? ? ? ? ? ? Mountain View, CA, US | >> | Making life hard for others since 1977. ? ? ? ? ? ? PGP 4BD6C0CB | >> >> _______________________________________________ >> Outages mailing list >> Outages@outages.org >> https://puck.nether.net/mailman/listinfo/outages _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Sorry, but it doesn't help. The recurring latency I've reported is quite real.
You are correct. Comcast has lots of intentional capacity issues. This is not news. http://lmgtfy.com/?q=comcast+backdoor+santa -- Steve Rubin ser@layer42.net Layer42 Networks http://www.layer42.net/ Expect More from A Web Solutions Provider (408)450-5742
participants (6)
-
Adam Rothschild
-
Craig Cooter
-
Jeremy Chadwick
-
Kevin Blackham
-
Ren Provo
-
Steve Rubin