EDGE: Anyone seeing 100% CPU on Fortigate edge routers?

Got a client whose FGT60D -- two of them; they have a hot spare -- is jamming to 100% continuous CPU full time, enough to impinge capacity sharply on their 90mb/s Road Runner line to Five9. Since their primary and hotspare boxes behave identically, end neither ever did before, I infer a pattern-file pooch-screwing at Fortigate, and am looking for corroborating reports. Anybody? Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

On 01/15/2015 09:44 AM, Jay Ashworth via Outages wrote:
Got a client whose FGT60D -- two of them; they have a hot spare -- is jamming to 100% continuous CPU full time, enough to impinge capacity sharply on their 90mb/s Road Runner line to Five9.
Since their primary and hotspare boxes behave identically, end neither ever did before, I infer a pattern-file pooch-screwing at Fortigate, and am looking for corroborating reports.
Have them in shop but haven't tested them yet. If UTM is on, check the options enabled. Turn off those options they don't need. I've worked on blades that when using all of App Control, spiked the CPU. We ended up disabling some of the options and that brought down the CPU. --John
Anybody?
Cheers, -- jra

FOLO: About 5 or 6 minutes after I posted this, the problem -- after being rock solid since about 0930EST this morning, "magically" went away. :-) Still interested in any reports from earlier. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

Any chance they were under attack (DDoS)? On Jan 15, 2015, at 12:07 PM, Jay Ashworth via Outages <outages@outages.org> wrote: FOLO: About 5 or 6 minutes after I posted this, the problem -- after being rock solid since about 0930EST this morning, "magically" went away. :-) Still interested in any reports from earlier. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274 _______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

----- Original Message -----
From: "Peter E" <peeip989@gmail.com>
Any chance they were under attack (DDoS)?
That had been my first thought, but Road Runner support said they didn't see any appreciable incoming traffic on the link; they were my first call. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

We have a few dozen out there in the field and haven't seen anything today. If you see the issue again, get on the unit and run 'diag sys top' to find the process that is burning CPU. We have in the past seen ssl, imd (instant messaging intercept daemon), cache etc cause issues. That will at least point you in the right direction to what is causing issues. You can then 'diag system kill <signal> <pid>' to restart it. Google the process name before doing that to make sure it's not something critical. On 01/15/2015 11:07 AM, Jay Ashworth via Outages wrote:
FOLO: About 5 or 6 minutes after I posted this, the problem -- after being rock solid since about 0930EST this morning, "magically" went away.
:-)
Still interested in any reports from earlier.
Cheers, -- jra

Can we please move this thread to outages-discuss Thank you regards, /virendra
On Jan 15, 2015, at 15:47, Andy Brezinsky via Outages <outages@outages.org> wrote:
We have a few dozen out there in the field and haven't seen anything today.
If you see the issue again, get on the unit and run 'diag sys top' to find the process that is burning CPU. We have in the past seen ssl, imd (instant messaging intercept daemon), cache etc cause issues. That will at least point you in the right direction to what is causing issues.
You can then 'diag system kill <signal> <pid>' to restart it. Google the process name before doing that to make sure it's not something critical.
On 01/15/2015 11:07 AM, Jay Ashworth via Outages wrote: FOLO: About 5 or 6 minutes after I posted this, the problem -- after being rock solid since about 0930EST this morning, "magically" went away.
:-)
Still interested in any reports from earlier.
Cheers, -- jra
_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

----- Original Message -----
From: "Andy Brezinsky via Outages" <outages@outages.org>
If you see the issue again, get on the unit and run 'diag sys top' to find the process that is burning CPU. We have in the past seen ssl imd (instant messaging intercept daemon), cache etc cause issues. That will at least point you in the right direction to what is causing issues.
I had; the first time it was ipsmonitor or ipsengine (I forget which); the second and subsequent times, it was scanunitd. Both times, it settled down *just* before support got me off hold; it was slightly maddening. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

On 15 Jan 2015, at 23:44, Jay Ashworth via Outages wrote:
Since their primary and hotspare boxes behave identically, end neither ever did before, I infer a pattern-file pooch-screwing at Fortigate, and am looking for corroborating reports.
Is it possible they're being DDoSed? Stateful devices don't generally react well to DDoS attacks: <https://app.box.com/s/a3oqqlgwe15j8svojvzl> ----------------------------------- Roland Dobbins <rdobbins@arbor.net>

----- Original Message -----
From: "Roland Dobbins via Outages" <outages@outages.org>
On 15 Jan 2015, at 23:44, Jay Ashworth via Outages wrote:
Since their primary and hotspare boxes behave identically, end neither ever did before, I infer a pattern-file pooch-screwing at Fortigate, and am looking for corroborating reports.
Is it possible they're being DDoSed? Stateful devices don't generally react well to DDoS attacks:
If they did it was a *very* targeted attack, because Road Runner's support guy said they didn't see any appreciable amount of inbound traffic at that time. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

On 16 Jan 2015, at 3:38, Jay Ashworth wrote:
If they did it was a *very* targeted attack, because Road Runner's support guy said they didn't see any appreciable amount of inbound traffic at that time.
We've seen hardware load-balancers rated at 10gb/sec taken down with only 60kpps of HOIC for 60s (and require 45m to reboot), so high-throughput/-banwidth isn't really necessary; stateful devices make it a lot easier to DDoS a given target with far less traffic than would be otherwise required. Just a thought - it might be worth having a gander at whatever telemetry is available. ----------------------------------- Roland Dobbins <rdobbins@arbor.net>

Roland Dobbins via Outages wrote on 1/15/2015 2:42 PM:
On 16 Jan 2015, at 3:38, Jay Ashworth wrote:
If they did it was a *very* targeted attack, because Road Runner's support guy said they didn't see any appreciable amount of inbound traffic at that time.
We've seen hardware load-balancers rated at 10gb/sec taken down with only 60kpps of HOIC for 60s (and require 45m to reboot), so high-throughput/-banwidth isn't really necessary; stateful devices make it a lot easier to DDoS a given target with far less traffic than would be otherwise required.
Just a thought - it might be worth having a gander at whatever telemetry is available.
Similarly, stateless devices can often be overwhelmed when faced with unexpected traffic types. For instance, a 7600 Sup720 can become unresponsive due to a few Mbps of IP traffic with IP options hitting an ACL that punts the traffic to the CPU. --Blake

On 16 Jan 2015, at 5:17, Blake Hudson via Outages wrote:
For instance, a 7600 Sup720 can become unresponsive due to a few Mbps of IP traffic with IP options hitting an ACL that punts the traffic to the CPU.
Yes, obsolete hardware generally has less TCAM resources than more modern hardware, and fewer self-protection mechanisms. There are ways to choke input queues, cause traffic to be punted, etc. even on more modern hardware (although more modern hardware has various self-protection mechanisms which can be utilized to ameliorate the effects of such traffic). And even on older hardware, there are some tricks one can do to limit this particular set of attack surfaces. But stateless filtering in front of servers isn't *conceptually* flawed; stateful filtering in front of servers *is* conceptually flawed. Any further discussion of this topic in the context of the outages community should probably take place on outages-discuss. ----------------------------------- Roland Dobbins <rdobbins@arbor.net>

On Thu, Jan 15, 2015 at 11:43 AM, Roland Dobbins via Outages <outages@outages.org> wrote:
Stateful devices don't generally react well to DDoS attacks:
Did anyone else read that in a Sean Connery voice? https://www.youtube.com/watch?v=kt0sH1C_XBc -A
participants (8)
-
Aaron C. de Bruyn
-
Andy Brezinsky
-
Blake Hudson
-
Jay Ashworth
-
John Schiel
-
Peter E
-
Roland Dobbins
-
virendra rode