All,
I want to have an alert fire any time an application pool is more than say 2 standard deviations from the normal. We have about 100 application pools.
I am guessing the logic would look something like this?
tag=java tag=problem |
stats count by app_pool |
where count > [somelogic 2std * somesplunkcommand I dont know]
Try this:
tag=java tag=problem | bucket _time span=1h | stats count BY _time app_pool | eventstats stdev(count) AS stdev BY app_pool | where count > (2 * stdev)
YES. Why I didnt get that I'll never know. I tried bucketing but it seems not the eventstats.
Thanks for the help, as always.
Todd
The problem was chart
vs. stats
and creating columns instead of rows. Don't forget to click Accept
.
Try this:
tag=java tag=problem | stats count by app_pool | eventstats stdev(count) AS stdev | where count > (2 * stdev)
any updates?
Well, this KINDA works. What happens when this is run is that it gives 1 stdev for ALL app_pools but what we need is the stdev for EACH app_pool.
For example this is the output using this search:
app_pool count stdev
1 aaa 14576 10478.310567
2 abb 342 10478.310567
3 acc 45 10478.310567
4 add 1824 10478.310567
What we are trying to achieve is something like this:
app_pool count stdev
1 aaa 14576 its stdev
2 abb 342 its stdev
3 acc 45 its stdev
4 add 1824 its stdev
then we can use:
where count > (2 * stdev)
to alert on.
I tried something like:
| eventstats stdev(count) AS stdev by app_pool
but that returns a stdev of 0 for all app_pools
Back it up. To do a stdev
, you need series of numbers so you have to have a count of something. Unless your raw data has counts (which clearly it does not, since you are using count
instead of sum
), then we must do a count first, that is why I wrote it the way that I did. We could use timechart
to generate an series of counts per app_pool
, say hourly, from which we could then to a stdev
per app_pool
but we MUST have a series of numbers FIRST and only you can specify the necessary parameters. As an example, here is a solution for hourly timecharting:
tag=java tag=problem | timechart span=1h count BY app_pool | eventstats stdev(count) AS stdev BY app_pool | where count > (2 * stdev)
I understand.
I DID try this before posting it here using the timechart command BUT I couldn't get it to work. The one above does not work either, it returns 0 results. I'm guessing that its not returning a stdev as I removed the "| where count > (2 * stdev)" portion and it seems its returning a count but not a stdev:
_time aaa abb acc add
2016-04-28 09:00 5377 728 174 28790
2016-04-28 10:00 4303 584 29 18686
I confirmed this by only running:
tag=java tag=problem | timechart span=1h count by app_pool
and it returns the same results.
So it seems that its counting properly but not calculating the stdev after counting.
This search:
tag=java tag=problem | timechart span=1h count by app_pool| eventstats stdev(count) AS stdev by app_pool
is the same results as this search:
tag=java tag=problem | timechart span=1h count by app_pool