Long latency in receiving email on this list/Other observations about today's Google Incident

I am looking through the emails from the outages list from today's Google incident and many of the posts have original time stamps. I received my copies of mailing lists posts over an hour after they were sent. For example, I posted a message today at 12:11PM EDT, and I received my copy from the list at 1:20PM. We were well into our troubleshooting with Verizon and Level 3 when I saw the first post on this list. Is there anything we can do to speed the delivery of the emails from this list? We have had a better incident response had we known sooner that this was a global issue. And I was still receiving hour old trouble reports long after the problem was resolved. I also called two out of three of our upstream providers (Level 3 and Verizon) Neither had status messages on their web sites, nor for their dispatch folks. We had all of the relevant information (account #, Circuit ID etc) but they insisted on going through a long ticket process (Have you verified power, what are your IPS, what IPs are you trying to reach, what are your contact hours, what type of equipment do you have etc) instead of just taking the call. At Verizon, (50 Mbit circuit) I was on hold for about 15 minutes before I was even able to place a trouble ticket. I also saw speculation on the list that it was a Level 3 problem, and in fact we could not reach Google with any reliability on our DS3 with Level3 (On level3 IP Space) But our Comcast line uses Level 3 for transport and we had no issues reaching Google services from Comcast. Bob Roswell System Source broswell@syssrc.com (410) 771-5544 ext 4336

Bob Roswell wrote:
I am looking through the emails from the outages list from today’s Google incident and many of the posts have original time stamps. I received my copies of mailing lists posts over an hour after they were sent. For example, I posted a message today at 12:11PM EDT, and I received my copy from the list at 1:20PM. We were well into our troubleshooting with Verizon and Level 3 when I saw the first post on this list. Is there anything we can do to speed the delivery of the emails from this list? We have had a better incident response had we known sooner that this was a global issue. And I was still receiving hour old trouble reports long after the problem was resolved.
I also called two out of three of our upstream providers (Level 3 and Verizon) Neither had status messages on their web sites, nor for their dispatch folks. We had all of the relevant information (account #, Circuit ID etc) but they insisted on going through a long ticket process (Have you verified power, what are your IPS, what IPs are you trying to reach, what are your contact hours, what type of equipment do you have etc) instead of just taking the call. At Verizon, (50 Mbit circuit) I was on hold for about 15 minutes before I was even able to place a trouble ticket.
I also saw speculation on the list that it was a Level 3 problem, and in fact we could not reach Google with any reliability on our DS3 with Level3 (On level3 IP Space) But our Comcast line uses Level 3 for transport and we had no issues reaching Google services from Comcast.
I wouldn't fault any them for going through the long process since the problem was purely a Google issue. Why would their dispatch folks even care? Why would they put status messages up just because Google fails? It's not like this is the first time it's ever happened, and it will happen again. It's sad we assume that Google is a non-fault entity and that it must be our transit providers. As far as deliveries, yeah, I notice some posts to the list show up hours later than others. Never tried to gather any data about it, though. ~Seth

On Fri, May 15, 2009 at 1:09 AM, Seth Mattinen <sethm@rollernet.us> wrote:
Bob Roswell wrote:
I am looking through the emails from the outages list from today’s Google incident and many of the posts have original time stamps. I received my copies of mailing lists posts over an hour after they were sent. For example, I posted a message today at 12:11PM EDT, and I received my copy from the list at 1:20PM. We were well into our troubleshooting with Verizon and Level 3 when I saw the first post on this list. Is there anything we can do to speed the delivery of the emails from this list? We have had a better incident response had we known sooner that this was a global issue. And I was still receiving hour old trouble reports long after the problem was resolved.
I also called two out of three of our upstream providers (Level 3 and Verizon) Neither had status messages on their web sites, nor for their dispatch folks. We had all of the relevant information (account #, Circuit ID etc) but they insisted on going through a long ticket process (Have you verified power, what are your IPS, what IPs are you trying to reach, what are your contact hours, what type of equipment do you have etc) instead of just taking the call. At Verizon, (50 Mbit circuit) I was on hold for about 15 minutes before I was even able to place a trouble ticket.
I also saw speculation on the list that it was a Level 3 problem, and in fact we could not reach Google with any reliability on our DS3 with Level3 (On level3 IP Space) But our Comcast line uses Level 3 for transport and we had no issues reaching Google services from Comcast.
I wouldn't fault any them for going through the long process since the problem was purely a Google issue. Why would their dispatch folks even care? Why would they put status messages up just because Google fails? It's not like this is the first time it's ever happened, and it will happen again. It's sad we assume that Google is a non-fault entity and that it must be our transit providers.
As far as deliveries, yeah, I notice some posts to the list show up hours later than others. Never tried to gather any data about it, though.
Forgive me if this seems like plugging, I am just passing information, sorry. Yesterday, when it all happened, we covered it in realtime at the Internet Storm Center at http://isc.sans.org. As Gmail was down, Email may not have been a good way to pass information, for instance yesterday, I used a combination of the website and Twitter. This turned out to be extremely effective as we were able to isolate exactly where the outages were, who was working, who wasn't working, and we were able to pass the reason for the outage to our readers about an hour before Google said anything. It was a wild rollercoaster, as the tweets came in about 50 at a time, resulting in many hundreds (if not over a thousand) tweets -- and sources of information. -- joel esler | Sourcefire | gtalk: jesler@sourcefire.com | 302-223-5974 | http://twitter.com/joelesler

Seth Mattinen wrote:
As far as deliveries, yeah, I notice some posts to the list show up hours later than others. Never tried to gather any data about it, though.
When I saw this topic come up, I chuckled to my self: "yaknow, it was not that long ago that we marveled that I could get an email from Sausalito to College Park and an answer back in less time than the same messages would take by US Mail." When it got down to a roundtrip in the same day..... -- Requiescas in pace o email Two identifying characteristics of System Administrators: Ex turpi causa non oritur actio Infallibility, and the ability to learn from their mistakes. Eppure si rinfresca ICBM Targeting Information: http://tinyurl.com/4sqczs http://tinyurl.com/7tp8ml

On Fri, 15 May 2009 09:19:23 CDT, Larry Sheldon said:
When I saw this topic come up, I chuckled to my self: "yaknow, it was not that long ago that we marveled that I could get an email from Sausalito to College Park and an answer back in less time than the same messages would take by US Mail."
bangpaths FTW! ;)
participants (5)
-
Bob Roswell
-
Joel Esler
-
Larry Sheldon
-
Seth Mattinen
-
Valdis.Kletnieks@vt.edu