Deployment Architecture

How can we detect excessive overlapping alerts?

ddrillic
Ultra Champion

We reach situations in which application teams set their alerts at the top of the hour and when we (the Splunk team) catch it, it might be too late.

Is there a way to produce a report which lists the run times and detect excessive usage times?

0 Karma
1 Solution

skoelpin
SplunkTrust
SplunkTrust

Yeah, you can use the internal index for this. You should explicitly add savedsearch_name for this

index=_internal savedsearch_name=*
| timechart max(run_time) AS run_time by savedsearch_name

View solution in original post

skoelpin
SplunkTrust
SplunkTrust

Yeah, you can use the internal index for this. You should explicitly add savedsearch_name for this

index=_internal savedsearch_name=*
| timechart max(run_time) AS run_time by savedsearch_name

ddrillic
Ultra Champion

Thank you @skoelpin.

I changed the max to sum and we can see -

alt text

We can see that at each quarter of the hour we have peak usage.
Can we find out from _internal how many searches were skipped?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Yes, you sure can!

index=_internal sourcetype=scheduled status=skipped NOT "_ACCELERATE*"
| timechart count by savedsearch_name
0 Karma

ddrillic
Ultra Champion

Just ran -

index=_internal sourcetype=scheduler status=skipped NOT "_ACCELERATE*"
 | timechart count

It shows -

alt text

0 Karma

ddrillic
Ultra Champion

The totals for an hour are -

alt text

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Yeah, you have a problem with skips at 4am. You should trend this over time by using timewrap to see if there's a pattern. Most likely, other searches are competing for resources and they run long and cause skips. You can fix this by changing search priroty away from 0 to auto.

You can split by savedsearch_name or get a total over a span of time by adding span=1h. We use this search to alert us and cut a ticket when we start skipping. Skips are unacceptable for us

ddrillic
Ultra Champion

Much appreciated @skoelpin.

0 Karma
Get Updates on the Splunk Community!

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...

New This Month - SLO Capabilities, APM Advanced Filtering & Usage Analytics Plus ...

More for SLO Management We’re continuing to expand the built-in SLO management experience in Splunk ...

Enterprise Security Content Update (ESCU) | New Releases

In June, the Splunk Threat Research Team had 2 releases of new security content via the Enterprise Security ...