Linode outage in Newark, NJ starting at 3:30 pm

FYI, http://status.linode.com/ We started getting alerts at roughly ~3:30 - Kyle The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.

Likely related: WPEngine reports network connectivity issues in their Newark, NJ datacenter. We noticed at roughly 12:30p PDT. They have no ETA. --j On Aug 31, 2012, at 12:46 PM, Smith, Kyle wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer. _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

My Linode systems are coming back up now. -- Paul Norton Systems Administrator Neoverve - http://www.neoverve.com On Aug 31, 2012, at 12:55 PM, Jim Meyer wrote:
Likely related: WPEngine reports network connectivity issues in their Newark, NJ datacenter.
We noticed at roughly 12:30p PDT. They have no ETA.

3:52pm (EDT): NAC has informed us of a power issue affecting at least some portion of the datacenter. As soon as power is restored we are poised to execute our recovery procedures to all affected systems. On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

This is the incident report from NAC: Subsequent to the utility power failure at our Cedar Knolls facility the generator that powers systems A and D did not start automatically. Manual intervention was required to get generator power running. Unfortunately before the generator was manually engaged customers on these systems experienced a complete power loss. We are still investigating why the generator did not engage automatically.Utility power has been restored, and transfer back to utility was successful. On Fri, Aug 31, 2012 at 3:59 PM, Jack Carrozzo <jack@crepinc.com> wrote:
3:52pm (EDT): NAC has informed us of a power issue affecting at least some portion of the datacenter. As soon as power is restored we are poised to execute our recovery procedures to all affected systems.
On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Is it common practice to have an array of batteries in between the colo and the generators to give you time to start the generators? ----- Original Message ----- From: "Sadiq Saif" <sadiq@asininetech.com> To: "Jack Carrozzo" <jack@crepinc.com> Cc: outages@outages.org Sent: Friday, August 31, 2012 2:59:16 PM Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm This is the incident report from NAC: Subsequent to the utility power failure at our Cedar Knolls facility the generator that powers systems A and D did not start automatically. Manual intervention was required to get generator power running. Unfortunately before the generator was manually engaged customers on these systems experienced a complete power loss. We are still investigating why the generator did not engage automatically.Utility power has been restored, and transfer back to utility was successful. On Fri, Aug 31, 2012 at 3:59 PM, Jack Carrozzo <jack@crepinc.com> wrote:
3:52pm (EDT): NAC has informed us of a power issue affecting at least some portion of the datacenter. As soon as power is restored we are poised to execute our recovery procedures to all affected systems.
On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

It is common to have not only strings of batteries but an ATS, or automatic transfer switch. In this case, it sounds like the fault was on the ATS side of the plant based on the IR below. - LB -----Original Message----- From: outages-bounces@outages.org [mailto:outages-bounces@outages.org] On Behalf Of reza a Sent: Friday, August 31, 2012 3:29 PM To: outages@outages.org Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm Is it common practice to have an array of batteries in between the colo and the generators to give you time to start the generators? ----- Original Message ----- From: "Sadiq Saif" <sadiq@asininetech.com> To: "Jack Carrozzo" <jack@crepinc.com> Cc: outages@outages.org Sent: Friday, August 31, 2012 2:59:16 PM Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm This is the incident report from NAC: Subsequent to the utility power failure at our Cedar Knolls facility the generator that powers systems A and D did not start automatically. Manual intervention was required to get generator power running. Unfortunately before the generator was manually engaged customers on these systems experienced a complete power loss. We are still investigating why the generator did not engage automatically.Utility power has been restored, and transfer back to utility was successful. On Fri, Aug 31, 2012 at 3:59 PM, Jack Carrozzo <jack@crepinc.com> wrote:
3:52pm (EDT): NAC has informed us of a power issue affecting at least some portion of the datacenter. As soon as power is restored we are poised to execute our recovery procedures to all affected systems.
On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Sounds like to me they possibly switched over to battery successfully, however battery power was not sufficient while they got the generator functioning. Most large datacenters I have experience with have 30-45 minutes of battery. In an instance where you have generator issues, sometimes it could take a half an hour or more to get something functioning. We had an issue in our small datacenter about 2 years ago where the ATS didn't properly detect the power failure (long story, bad electrician). As it occurred at about 2am (we are not 24/7 in building) it took me nearly 30 minutes to get to the office thanks to snow. I barely beat the battery as everything was under 10% remaining. I have had customers in the same situation where their generator didn't start properly for one reason or another. Even with staff in house, it still has taken 20-25 minutes to get generators started. This to me stresses the importance of planned generator testing. Testing the generator, battery systems and ATS systems is very critical. This is why about once a month, I flip the mains to the datacenter (planned of course) to make sure everything is still fully functional. A planned 30 second test alerts us of any possible physical issues. Also by planning these we find out about possible issues sooner rather than later. If an ATS fails during a test, we ride on battery a few minutes and I go back to utility with no harm done. Blake Cisco, Microsoft, Adtran and VMware Certification Information available upon request. -----Original Message----- From: outages-bounces@outages.org [mailto:outages-bounces@outages.org] On Behalf Of Lonnie Bozeman Sent: Friday, August 31, 2012 4:53 PM To: reza a; outages@outages.org Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm It is common to have not only strings of batteries but an ATS, or automatic transfer switch. In this case, it sounds like the fault was on the ATS side of the plant based on the IR below. - LB -----Original Message----- From: outages-bounces@outages.org [mailto:outages-bounces@outages.org] On Behalf Of reza a Sent: Friday, August 31, 2012 3:29 PM To: outages@outages.org Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm Is it common practice to have an array of batteries in between the colo and the generators to give you time to start the generators? ----- Original Message ----- From: "Sadiq Saif" <sadiq@asininetech.com> To: "Jack Carrozzo" <jack@crepinc.com> Cc: outages@outages.org Sent: Friday, August 31, 2012 2:59:16 PM Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm This is the incident report from NAC: Subsequent to the utility power failure at our Cedar Knolls facility the generator that powers systems A and D did not start automatically. Manual intervention was required to get generator power running. Unfortunately before the generator was manually engaged customers on these systems experienced a complete power loss. We are still investigating why the generator did not engage automatically.Utility power has been restored, and transfer back to utility was successful. On Fri, Aug 31, 2012 at 3:59 PM, Jack Carrozzo <jack@crepinc.com> wrote:
3:52pm (EDT): NAC has informed us of a power issue affecting at least some portion of the datacenter. As soon as power is restored we are poised to execute our recovery procedures to all affected systems.
On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

On 8/31/12 3:28 PM, reza a wrote:
Is it common practice to have an array of batteries in between the colo and the generators to give you time to start the generators?
Yes, of course, but usually with big facilities that runtime is in minutes, not hours. Or seconds for the flywheel types. And when big breakers or big engines decide to act up they do so quite stubbornly. The larger the system the longer it takes to fix a fault even if you know exactly what's wrong. In any case, even if there is hours and hours of battery runtime, what does it matter if the air handlers are off? ~Seth

Didn't something similar with NAC happen back in April? http://en.wikipedia.org/wiki/Broadband_Reports#Current_Event I'm noticing a common trend here. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | On Fri, Aug 31, 2012 at 03:59:05PM -0400, Jack Carrozzo wrote:
3:52pm (EDT): NAC has informed us of a power issue affecting at least some portion of the datacenter. As soon as power is restored we are poised to execute our recovery procedures to all affected systems.
On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

Got this alert from BGPmon: ==================================================================== Withdraw of Prefix (Code: 97) ==================================================================== Your prefix: 2600:3c03::/32: Prefix Description: Linode Update time: 2012-08-31 19:23 (UTC) Detected by #peers: 45 Detected prefix: 2600:3c03::/32 mtr on my end is dying at my ISP's router. On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Traces:
From Comcast: 4 te-1-1-0-1-ar01.oakland.ca.sfba.comcast.net (69.139.198.94) 34.169 ms te-1-1-0-0-ar01.oakland.ca.sfba.comcast.net (69.139.198.86) 31.282 ms te-1-1-0-5-ar01.oakland.ca.sfba.comcast.net (68.86.143.98) 36.911 ms 5 he-2-15-0-0-cr01.sacramento.ca.ibone.comcast.net (68.86.91.225) 16.984 ms 22.581 ms 26.142 ms 6 pos-1-12-0-0-cr01.losangeles.ca.ibone.comcast.net (68.86.86.6) 25.206 ms 35.012 ms 33.332 ms 7 pos-0-14-0-0-cr01.dallas.tx.ibone.comcast.net (68.86.85.142) 62.252 ms 73.916 ms 59.768 ms 8 pos-0-15-0-0-cr01.atlanta.ga.ibone.comcast.net (68.86.85.150) 85.666 ms 79.472 ms 91.008 ms 9 he-0-9-0-0-cr01.ashburn.va.ibone.comcast.net (68.86.89.173) 97.512 ms 95.388 ms 99.624 ms 10 pos-0-8-0-0-cr01.newyork.ny.ibone.comcast.net (68.86.87.206) 107.661 ms 103.159 ms 103.111 ms 11 173.167.58.26 (173.167.58.26) 99.149 ms 117.406 ms 104.509 ms 12 0.e1-2.tbr1.mmu.nac.net (209.123.10.118) 96.875 ms 102.912 ms 98.129 ms 13 vlan801.esd1.mmu.nac.net (209.123.10.10) 95.907 ms 144.164 ms 102.986 ms 14 207.99.53.42 (207.99.53.42) 111.610 ms 362.702 ms 121.671 ms 15 * * *
From Level 3: 4 vlan80.csw3.SanJose1.Level3.net (4.69.152.190) 2.080 ms 2.078 ms 2.076 ms 5 ae-81-81.ebr1.SanJose1.Level3.net (4.69.153.9) 2.172 ms 2.693 ms 2.694 ms 6 ae-2-2.ebr2.NewYork1.Level3.net (4.69.135.186) 70.711 ms 69.966 ms 69.339 ms 7 ae-92-92.csw4.NewYork1.Level3.net (4.69.148.46) 69.507 ms 80.956 ms ae-82-82.csw3.NewYork1.Level3.net (4.69.148.42) 69.201 ms 8 ae-91-91.ebr1.NewYork1.Level3.net (4.69.134.77) 69.349 ms ae-71-71.ebr1.NewYork1.Level3.net (4.69.134.69) 69.213 ms ae-61-61.ebr1.NewYork1.Level3.net (4.69.134.65) 69.622 ms 9 ae-2-2.ebr1.Newark1.Level3.net (4.69.132.98) 70.839 ms 72.025 ms 70.831 ms 10 ae-11-51.car1.Newark1.Level3.net (4.69.156.5) 118.665 ms 118.662 ms 118.602 ms 11 NETCCESS.car1.Newark1.Level3.net (4.26.16.186) 72.655 ms 71.536 ms 71.751 ms 12 0.e3-3.tbr2.mmu.nac.net (209.123.11.77) 74.982 ms 74.777 ms 71.611 ms 13 vlan805.esd1.mmu.nac.net (209.123.10.34) 71.551 ms 71.933 ms 71.048 ms 14 207.99.53.42 (207.99.53.42) 93.972 ms 93.961 ms 93.954 ms 15 * * *
--j On Aug 31, 2012, at 1:00 PM, Sadiq Saif wrote:
Got this alert from BGPmon: ==================================================================== Withdraw of Prefix (Code: 97) ==================================================================== Your prefix: 2600:3c03::/32: Prefix Description: Linode Update time: 2012-08-31 19:23 (UTC) Detected by #peers: 45 Detected prefix: 2600:3c03::/32
mtr on my end is dying at my ISP's router.
On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

The status page on linode indicates it's a power issue. Power has been restored and they're working to bring all associated nodes back online. -----Original Message----- From: outages-bounces@outages.org [mailto:outages-bounces@outages.org] On Behalf Of Sadiq Saif Sent: Friday, August 31, 2012 1:00 PM To: Smith, Kyle Cc: outages@outages.org Subject: Re: [outages] Linode outage in Newark, NJ starting at 3:30 pm Got this alert from BGPmon: ==================================================================== Withdraw of Prefix (Code: 97) ==================================================================== Your prefix: 2600:3c03::/32: Prefix Description: Linode Update time: 2012-08-31 19:23 (UTC) Detected by #peers: 45 Detected prefix: 2600:3c03::/32 mtr on my end is dying at my ISP's router. On Fri, Aug 31, 2012 at 3:46 PM, Smith, Kyle <ksmith@litle.com> wrote:
FYI,
We started getting alerts at roughly ~3:30
- Kyle
The information in this message is for the intended recipient(s) only and may be the proprietary and/or confidential property of Litle & Co., LLC, and thus protected from disclosure. If you are not the intended recipient(s), or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is prohibited. If you have received this communication in error, please notify Litle & Co. immediately by replying to this message and then promptly deleting it and your reply permanently from your computer.
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
-- Sadiq S O< ascii ribbon campaign - stop html mail - www.asciiribbon.org _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages
participants (11)
-
Blake Pfankuch
-
Darren Schreiber
-
Jack Carrozzo
-
Jeremy Chadwick
-
Jim Meyer
-
Lonnie Bozeman
-
Paul Norton
-
reza a
-
Sadiq Saif
-
Seth Mattinen
-
Smith, Kyle