Re: How to search a trending error count to alert ...

daniel333 · ‎03-28-2016

All,

I want to have an alert fire any time an application pool is more than say 2 standard deviations from the normal. We have about 100 application pools.

I am guessing the logic would look something like this?

 tag=java tag=problem | 
stats count by app_pool |
where count > [somelogic 2std * somesplunkcommand I dont know]

woodcock · ‎05-11-2016

Try this:

tag=java tag=problem | bucket _time span=1h | stats count BY _time app_pool | eventstats stdev(count) AS stdev BY app_pool | where count > (2 * stdev)

tkwaller · ‎05-12-2016

YES. Why I didnt get that I'll never know. I tried bucketing but it seems not the eventstats.
Thanks for the help, as always.
Todd

woodcock · ‎05-12-2016

The problem was chart vs. stats and creating columns instead of rows. Don't forget to click Accept.

woodcock · ‎03-28-2016

Try this:

tag=java tag=problem | stats count by app_pool | eventstats stdev(count) AS stdev | where count > (2 * stdev)

tkwaller · ‎05-10-2016

any updates?

tkwaller · ‎04-28-2016

Well, this KINDA works. What happens when this is run is that it gives 1 stdev for ALL app_pools but what we need is the stdev for EACH app_pool.

For example this is the output using this search:
app_pool count stdev
1 aaa 14576 10478.310567
2 abb 342 10478.310567
3 acc 45 10478.310567
4 add 1824 10478.310567

What we are trying to achieve is something like this:
app_pool count stdev
1 aaa 14576 its stdev
2 abb 342 its stdev
3 acc 45 its stdev
4 add 1824 its stdev

then we can use:
where count > (2 * stdev)
to alert on.

I tried something like:
| eventstats stdev(count) AS stdev by app_pool

but that returns a stdev of 0 for all app_pools

woodcock · ‎04-28-2016

Back it up. To do a stdev, you need series of numbers so you have to have a count of something. Unless your raw data has counts (which clearly it does not, since you are using count instead of sum), then we must do a count first, that is why I wrote it the way that I did. We could use timechart to generate an series of counts per app_pool, say hourly, from which we could then to a stdev per app_pool but we MUST have a series of numbers FIRST and only you can specify the necessary parameters. As an example, here is a solution for hourly timecharting:

 tag=java tag=problem | timechart span=1h count BY app_pool | eventstats stdev(count) AS stdev BY app_pool | where count > (2 * stdev)

tkwaller · ‎04-28-2016

I understand.
I DID try this before posting it here using the timechart command BUT I couldn't get it to work. The one above does not work either, it returns 0 results. I'm guessing that its not returning a stdev as I removed the "| where count > (2 * stdev)" portion and it seems its returning a count but not a stdev:

_time aaa abb acc add
2016-04-28 09:00 5377 728 174 28790
2016-04-28 10:00 4303 584 29 18686

I confirmed this by only running:
tag=java tag=problem | timechart span=1h count by app_pool
and it returns the same results.

So it seems that its counting properly but not calculating the stdev after counting.
This search:
tag=java tag=problem | timechart span=1h count by app_pool| eventstats stdev(count) AS stdev by app_pool

is the same results as this search:
tag=java tag=problem | timechart span=1h count by app_pool

How to search a trending error count to alert when an application pool is more than 2 standard deviations from the normal?

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

Wrapping Up Cybersecurity Awareness Month

🌟 From Audit Chaos to Clarity: Welcoming Audit Trail v2

Are you a member of the Splunk Community?

How to search a trending error count to alert when an application pool is more than 2 standard deviations from the normal?

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

Wrapping Up Cybersecurity Awareness Month

🌟 From Audit Chaos to Clarity: Welcoming Audit Trail v2