
Since approx. 4am CT, we've been seeing 15-30% dropped packets for traffic going from our Chicago Level3 connection to Amazon's EAST-1 data center. Our San Francisco office (also using Level3) isn't experiencing a problem. I've re-routed this traffic to our backup Internet connection (Airlogic) for now, and that traffic is getting through just fine. Anyone else seeing this problem? Ticket already opened with Level3, waiting to hear back. Thank you. Here's the route my traffic is taking (time outs at the bottom are normal, ICMP gets blocked from that point on). Tracing route to zioreports.com [75.101.163.221] over a maximum of 30 hops: 1 1 ms 1 ms 1 ms ge-6-2-107.car4.Chicago1.Level3.net [4.71.102.161] 2 * * * Request timed out. 3 16 ms 22 ms 25 ms ae-5-5.ebr2.Chicago2.Level3.net [4.69.140.194] 4 19 ms 24 ms 25 ms ae-6-6.ebr2.Washington12.Level3.net [4.69.148.145] 5 * * * Request timed out. 6 16 ms 18 ms 23 ms ae-82-82.csw3.Washington1.Level3.net [4.69.134.154] 7 16 ms 16 ms 17 ms ae-3-80.edge2.Washington1.Level3.net [4.69.149.142] 8 17 ms 17 ms 16 ms AMAZON.COM.edge2.Washington1.Level3.net [4.79.22.74] 9 * * * Request timed out. 10 17 ms 18 ms 17 ms 72.21.222.139 11 18 ms 18 ms 18 ms 216.182.224.17 12 * * * Request timed out. 13 * * * Request timed out. 14 * * * Request timed out. 15 * * * Request timed out. 16 * * * Request timed out. 17 * * * Request timed out. 18 * * ^C Here are the results of a TCPing to port 443: 2012:09:27 09:50:02 Probing 75.101.163.221:443/tcp - Port is open - time=20.926ms 2012:09:27 09:50:03 Probing 75.101.163.221:443/tcp - Port is open - time=31.168ms 2012:09:27 09:50:05 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.341ms 2012:09:27 09:50:07 Probing 75.101.163.221:443/tcp - Port is open - time=31.358ms 2012:09:27 09:50:08 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.448ms 2012:09:27 09:50:10 Probing 75.101.163.221:443/tcp - Port is open - time=30.985ms 2012:09:27 09:50:12 Probing 75.101.163.221:443/tcp - Port is open - time=31.224ms 2012:09:27 09:50:13 Probing 75.101.163.221:443/tcp - Port is open - time=32.065ms 2012:09:27 09:50:15 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.375ms 2012:09:27 09:50:17 Probing 75.101.163.221:443/tcp - Port is open - time=31.115ms Ping statistics for 75.101.163.221:443 10 probes sent. 7 successful, 3 failed. Approximate trip times in milli-seconds (successful connections only): Minimum = 20.926ms, Maximum = 32.065ms, Average = 20.884ms ---- George Sarlas Manager, IT Operations iRhythm Technologies, Inc. 2 Marriott Dr. Lincolnshire, IL 60069 email: gsarlas@irhythmtech.com phone: 224-543-4253

George, For the future: you need to provide traceroutes from both directions. Most routing these days is asymmetric. Heavily covered here: http://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N4... But now that you've "re-routed traffic" (I assume you simply denounced and de-peered with your AS3356 peering point? Or did you physically down an interface?), the reverse-path traceroute (from 75.101.163.221 to wherever you did the original TCP ping from) won't necessarily be the same as when you were being impacted. TL;DR -- always provide traceroutes from both directions, and always perform this *before* making any routing/peering changes. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Thu, Sep 27, 2012 at 06:50:34PM +0000, Sarlas, George wrote:
Since approx. 4am CT, we've been seeing 15-30% dropped packets for traffic going from our Chicago Level3 connection to Amazon's EAST-1 data center. Our San Francisco office (also using Level3) isn't experiencing a problem. I've re-routed this traffic to our backup Internet connection (Airlogic) for now, and that traffic is getting through just fine. Anyone else seeing this problem? Ticket already opened with Level3, waiting to hear back. Thank you.
Here's the route my traffic is taking (time outs at the bottom are normal, ICMP gets blocked from that point on).
Tracing route to zioreports.com [75.101.163.221]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms ge-6-2-107.car4.Chicago1.Level3.net [4.71.102.161]
2 * * * Request timed out.
3 16 ms 22 ms 25 ms ae-5-5.ebr2.Chicago2.Level3.net [4.69.140.194]
4 19 ms 24 ms 25 ms ae-6-6.ebr2.Washington12.Level3.net [4.69.148.145]
5 * * * Request timed out.
6 16 ms 18 ms 23 ms ae-82-82.csw3.Washington1.Level3.net [4.69.134.154]
7 16 ms 16 ms 17 ms ae-3-80.edge2.Washington1.Level3.net [4.69.149.142]
8 17 ms 17 ms 16 ms AMAZON.COM.edge2.Washington1.Level3.net [4.79.22.74]
9 * * * Request timed out.
10 17 ms 18 ms 17 ms 72.21.222.139
11 18 ms 18 ms 18 ms 216.182.224.17
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * ^C
Here are the results of a TCPing to port 443:
2012:09:27 09:50:02 Probing 75.101.163.221:443/tcp - Port is open - time=20.926ms
2012:09:27 09:50:03 Probing 75.101.163.221:443/tcp - Port is open - time=31.168ms
2012:09:27 09:50:05 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.341ms
2012:09:27 09:50:07 Probing 75.101.163.221:443/tcp - Port is open - time=31.358ms
2012:09:27 09:50:08 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.448ms
2012:09:27 09:50:10 Probing 75.101.163.221:443/tcp - Port is open - time=30.985ms
2012:09:27 09:50:12 Probing 75.101.163.221:443/tcp - Port is open - time=31.224ms
2012:09:27 09:50:13 Probing 75.101.163.221:443/tcp - Port is open - time=32.065ms
2012:09:27 09:50:15 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.375ms
2012:09:27 09:50:17 Probing 75.101.163.221:443/tcp - Port is open - time=31.115ms
Ping statistics for 75.101.163.221:443
10 probes sent.
7 successful, 3 failed.
Approximate trip times in milli-seconds (successful connections only): Minimum = 20.926ms, Maximum = 32.065ms, Average = 20.884ms
---- George Sarlas Manager, IT Operations iRhythm Technologies, Inc.
2 Marriott Dr. Lincolnshire, IL 60069
email: gsarlas@irhythmtech.com phone: 224-543-4253
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Jeremy, Thank you for the traceroute link. It was very informative. To answer your re-route question, I manually entered some routes on my firewall to redirect traffic to my backup ISP's line. That said, our service seems to be back to normal. While I haven't gotten an update on the trouble ticket I have opened with my ISP, another person on this mailing list had reported that his (possibly related) issue had been resolved (a Level3 network issue in Washington). So I tested again, and now I'm up and running. Thanks to everyone for their input. -george -----Original Message----- From: Jeremy Chadwick [mailto:jdc@koitsu.org] Sent: Thursday, September 27, 2012 2:06 PM To: Sarlas, George Cc: outages@outages.org Subject: Re: [outages] Chicago-area Level3 to Amazon AWS EAST-1 George, For the future: you need to provide traceroutes from both directions. Most routing these days is asymmetric. Heavily covered here: http://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N4... But now that you've "re-routed traffic" (I assume you simply denounced and de-peered with your AS3356 peering point? Or did you physically down an interface?), the reverse-path traceroute (from 75.101.163.221 to wherever you did the original TCP ping from) won't necessarily be the same as when you were being impacted. TL;DR -- always provide traceroutes from both directions, and always perform this *before* making any routing/peering changes. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Thu, Sep 27, 2012 at 06:50:34PM +0000, Sarlas, George wrote:
Since approx. 4am CT, we've been seeing 15-30% dropped packets for traffic going from our Chicago Level3 connection to Amazon's EAST-1 data center. Our San Francisco office (also using Level3) isn't experiencing a problem. I've re-routed this traffic to our backup Internet connection (Airlogic) for now, and that traffic is getting through just fine. Anyone else seeing this problem? Ticket already opened with Level3, waiting to hear back. Thank you.
Here's the route my traffic is taking (time outs at the bottom are normal, ICMP gets blocked from that point on).
Tracing route to zioreports.com [75.101.163.221]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms ge-6-2-107.car4.Chicago1.Level3.net [4.71.102.161]
2 * * * Request timed out.
3 16 ms 22 ms 25 ms ae-5-5.ebr2.Chicago2.Level3.net [4.69.140.194]
4 19 ms 24 ms 25 ms ae-6-6.ebr2.Washington12.Level3.net [4.69.148.145]
5 * * * Request timed out.
6 16 ms 18 ms 23 ms ae-82-82.csw3.Washington1.Level3.net [4.69.134.154]
7 16 ms 16 ms 17 ms ae-3-80.edge2.Washington1.Level3.net [4.69.149.142]
8 17 ms 17 ms 16 ms AMAZON.COM.edge2.Washington1.Level3.net [4.79.22.74]
9 * * * Request timed out.
10 17 ms 18 ms 17 ms 72.21.222.139
11 18 ms 18 ms 18 ms 216.182.224.17
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * ^C
Here are the results of a TCPing to port 443:
2012:09:27 09:50:02 Probing 75.101.163.221:443/tcp - Port is open - time=20.926ms
2012:09:27 09:50:03 Probing 75.101.163.221:443/tcp - Port is open - time=31.168ms
2012:09:27 09:50:05 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.341ms
2012:09:27 09:50:07 Probing 75.101.163.221:443/tcp - Port is open - time=31.358ms
2012:09:27 09:50:08 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.448ms
2012:09:27 09:50:10 Probing 75.101.163.221:443/tcp - Port is open - time=30.985ms
2012:09:27 09:50:12 Probing 75.101.163.221:443/tcp - Port is open - time=31.224ms
2012:09:27 09:50:13 Probing 75.101.163.221:443/tcp - Port is open - time=32.065ms
2012:09:27 09:50:15 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.375ms
2012:09:27 09:50:17 Probing 75.101.163.221:443/tcp - Port is open - time=31.115ms
Ping statistics for 75.101.163.221:443
10 probes sent.
7 successful, 3 failed.
Approximate trip times in milli-seconds (successful connections only): Minimum = 20.926ms, Maximum = 32.065ms, Average = 20.884ms
---- George Sarlas Manager, IT Operations iRhythm Technologies, Inc.
2 Marriott Dr. Lincolnshire, IL 60069
email: gsarlas@irhythmtech.com phone: 224-543-4253
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

On Thu, Sep 27, 2012 at 06:50:34PM +0000, Sarlas, George wrote:
Since approx. 4am CT, we've been seeing 15-30% dropped packets for traffic going from our Chicago Level3 connection to Amazon's EAST-1 data center. Our San Francisco office (also using Level3) isn't experiencing a problem. I've re-routed this traffic to our backup Internet connection (Airlogic) for now, and that traffic is getting through just fine. Anyone else seeing this problem? Ticket already opened with Level3, waiting to hear back. Thank you.
We've got a ticket open with Level3 that appears to be the same issue that we've narrowed down with test cases to an issue in Level3's network in Washington. We suspect a problem with an individual link in a bundle that's blackholing traffic. Level3's escalated our ticket to the team that handles those things, but we haven't heard anything back yet. -- Brandon Ewing (nicotine@warningg.com)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi, On 09/27/2012 11:50 AM, Sarlas, George wrote:
Since approx. 4am CT, we've been seeing 15-30% dropped packets for traffic going from our Chicago Level3 connection to Amazon's EAST-1 data center. Our San Francisco office (also using Level3) isn't experiencing a problem. I've re-routed this traffic to our backup Internet connection (Airlogic) for now, and that traffic is getting through just fine. Anyone else seeing this problem? Ticket already opened with Level3, waiting to hear back. Thank you.
Here's the route my traffic is taking (time outs at the bottom are normal, ICMP gets blocked from that point on).
Tracing route to zioreports.com [75.101.163.221]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms ge-6-2-107.car4.Chicago1.Level3.net [4.71.102.161]
2 * * * Request timed out.
3 16 ms 22 ms 25 ms ae-5-5.ebr2.Chicago2.Level3.net [4.69.140.194]
4 19 ms 24 ms 25 ms ae-6-6.ebr2.Washington12.Level3.net [4.69.148.145]
5 * * * Request timed out.
6 16 ms 18 ms 23 ms ae-82-82.csw3.Washington1.Level3.net [4.69.134.154]
7 16 ms 16 ms 17 ms ae-3-80.edge2.Washington1.Level3.net [4.69.149.142]
8 17 ms 17 ms 16 ms AMAZON.COM.edge2.Washington1.Level3.net [4.79.22.74]
9 * * * Request timed out.
10 17 ms 18 ms 17 ms 72.21.222.139
11 18 ms 18 ms 18 ms 216.182.224.17
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * ^C
Here are the results of a TCPing to port 443:
2012:09:27 09:50:02 Probing 75.101.163.221:443/tcp - Port is open - time=20.926ms
2012:09:27 09:50:03 Probing 75.101.163.221:443/tcp - Port is open - time=31.168ms
2012:09:27 09:50:05 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.341ms
2012:09:27 09:50:07 Probing 75.101.163.221:443/tcp - Port is open - time=31.358ms
2012:09:27 09:50:08 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.448ms
2012:09:27 09:50:10 Probing 75.101.163.221:443/tcp - Port is open - time=30.985ms
2012:09:27 09:50:12 Probing 75.101.163.221:443/tcp - Port is open - time=31.224ms
2012:09:27 09:50:13 Probing 75.101.163.221:443/tcp - Port is open - time=32.065ms
2012:09:27 09:50:15 Probing 75.101.163.221:443/tcp - Socket is not connected (10057) - time=2012.375ms
2012:09:27 09:50:17 Probing 75.101.163.221:443/tcp - Port is open - time=31.115ms
Ping statistics for 75.101.163.221:443
10 probes sent.
7 successful, 3 failed.
Approximate trip times in milli-seconds (successful connections only): Minimum = 20.926ms, Maximum = 32.065ms, Average = 20.884ms
---- George Sarlas Manager, IT Operations iRhythm Technologies, Inc.
2 Marriott Dr. Lincolnshire, IL 60069
email: gsarlas@irhythmtech.com phone: 224-543-4253
- ----------------------- Dunno if this helps or muddies the water. I'm seeing an average of 122ms ("http response time") across all 14 networks (comcast, att, google, verizon, cox, qwest, charter, opendns, cablevision, bellsouth, twc, level3) etc) for Amazon EC2 US EAST 1A. Narrowing down to LEVEL3, I'm seeing an average of 180ms for http response time. However http connect time seems to be around 400ms compared to att, cox, verizon and qwest which is about 300ms. Doesn't prove anything but thought I bring it up for what I see based off external monitor. Sorry no raw data. regards, /virendra -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iF4EAREIAAYFAlBkssIACgkQ3HuimOHfh+GrtgD+J3YR21qAomCZic63lZt8JtXV sClFfgffT4/bvXBm3B0A+gP9fSweidrsz+5aMLLYx8yAXoY7AZlQZ1K2N2jx58tD =T3LA -----END PGP SIGNATURE-----
participants (4)
-
Brandon Ewing
-
Jeremy Chadwick
-
Sarlas, George
-
virendra rode