Alerting

How to search a trending error count to alert when an application pool is more than 2 standard deviations from the normal?

daniel333
Builder

All,

I want to have an alert fire any time an application pool is more than say 2 standard deviations from the normal. We have about 100 application pools.

I am guessing the logic would look something like this?

 tag=java tag=problem | 
stats count by app_pool |
where count > [somelogic 2std * somesplunkcommand I dont know]
0 Karma

woodcock
Esteemed Legend

Try this:

tag=java tag=problem | bucket _time span=1h | stats count BY _time app_pool | eventstats stdev(count) AS stdev BY app_pool | where count > (2 * stdev)

tkwaller
Builder

YES. Why I didnt get that I'll never know. I tried bucketing but it seems not the eventstats.
Thanks for the help, as always.
Todd

0 Karma

woodcock
Esteemed Legend

The problem was chart vs. stats and creating columns instead of rows. Don't forget to click Accept.

0 Karma

woodcock
Esteemed Legend

Try this:

tag=java tag=problem | stats count by app_pool | eventstats stdev(count) AS stdev | where count > (2 * stdev)

tkwaller
Builder

any updates?

0 Karma

tkwaller
Builder

Well, this KINDA works. What happens when this is run is that it gives 1 stdev for ALL app_pools but what we need is the stdev for EACH app_pool.

For example this is the output using this search:
app_pool count stdev
1 aaa 14576 10478.310567
2 abb 342 10478.310567
3 acc 45 10478.310567
4 add 1824 10478.310567

What we are trying to achieve is something like this:
app_pool count stdev
1 aaa 14576 its stdev
2 abb 342 its stdev
3 acc 45 its stdev
4 add 1824 its stdev

then we can use:
where count > (2 * stdev)
to alert on.

I tried something like:
| eventstats stdev(count) AS stdev by app_pool

but that returns a stdev of 0 for all app_pools

0 Karma

woodcock
Esteemed Legend

Back it up. To do a stdev, you need series of numbers so you have to have a count of something. Unless your raw data has counts (which clearly it does not, since you are using count instead of sum), then we must do a count first, that is why I wrote it the way that I did. We could use timechart to generate an series of counts per app_pool, say hourly, from which we could then to a stdev per app_pool but we MUST have a series of numbers FIRST and only you can specify the necessary parameters. As an example, here is a solution for hourly timecharting:

 tag=java tag=problem | timechart span=1h count BY app_pool | eventstats stdev(count) AS stdev BY app_pool | where count > (2 * stdev)
0 Karma

tkwaller
Builder

I understand.
I DID try this before posting it here using the timechart command BUT I couldn't get it to work. The one above does not work either, it returns 0 results. I'm guessing that its not returning a stdev as I removed the "| where count > (2 * stdev)" portion and it seems its returning a count but not a stdev:

_time aaa abb acc add
2016-04-28 09:00 5377 728 174 28790
2016-04-28 10:00 4303 584 29 18686

I confirmed this by only running:
tag=java tag=problem | timechart span=1h count by app_pool
and it returns the same results.

So it seems that its counting properly but not calculating the stdev after counting.
This search:
tag=java tag=problem | timechart span=1h count by app_pool| eventstats stdev(count) AS stdev by app_pool

is the same results as this search:
tag=java tag=problem | timechart span=1h count by app_pool

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...