Solved: calculate avg value over time - alert if 200% incr...

sonicZ · ‎05-31-2013

Hi,

I am trying to track a value on a backend server if a certain operation spikes to greater then 200% of the average value per 5 minutes, not sure how to do the alert part unless i enter a static value like this, and alert on the eval "high" value.

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" source="/app/logs/vipservices/vipservices.log" earliest=-5m | timechart span=5m count by host | eval BE_spike = if( count > 2000, "high", "normal")

what's the best way to schedule an alert if the OPERATION=Validate avg spikes higher then 200% of the previous values over time?

lguinn2 · ‎05-31-2013

Try this:

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
source="/app/logs/vipservices/vipservices.log" earliest=-5m 
| stats count as Last5Minutes by host
| join host [ search index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
    source="/app/logs/vipservices/vipservices.log" earliest=-30d latest=-5m
    | bucket span=5m _time
    | stats count by host 
    | stats avg(count) as Average by host ]
| where Last5Minutes > Average
| table host Last5Minutes Average

And set the alert to trigger when the number of results is greater than zero.

Test it by removing the where command. Also, I updated this after I realized that the original (using timechart) wasn't working properly.

View solution in original post

dwaddle · ‎05-31-2013

There is video from a presentation by Jesse Trucks at a recent Splunk Live which covers just about this exact same topic. Watch it at https://vimeo.com/66779015

lguinn2 · ‎05-31-2013

Try this:

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
source="/app/logs/vipservices/vipservices.log" earliest=-5m 
| stats count as Last5Minutes by host
| join host [ search index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
    source="/app/logs/vipservices/vipservices.log" earliest=-30d latest=-5m
    | bucket span=5m _time
    | stats count by host 
    | stats avg(count) as Average by host ]
| where Last5Minutes > Average
| table host Last5Minutes Average

And set the alert to trigger when the number of results is greater than zero.

Test it by removing the where command. Also, I updated this after I realized that the original (using timechart) wasn't working properly.

sonicZ · ‎07-01-2013

I have another request on this answer, what if i want to do the same query but compare the last 5 minutes vs the last 12 / 24 hours? I am messing around with spans and dividing the avg(count) ..Math is hard 🙂

sonicZ · ‎06-12-2013

Awesome this works, thanks again

lguinn2 · ‎06-08-2013

all I did was use eval to create a new variable called orig_host in the first search. You could also use rename

lguinn2 · ‎06-08-2013

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" source="/app/logs/vipservices/vipservices.log" earliest=-5m | stats count as Last5Minutes by host | eval orig_host = host | join orig_host [ search index=summary_vip orig_host=ship*be* OR orig_host=van*be* OP="Validate" source="VIP Operations by Host Summary Index Search 5 Min" earliest=-15m latest=-5m | bucket span=5m _time | stats count by orig_host | stats avg(count) as Average by orig_host ] | eval doubleAVG=(2*Average) | where Last5Minutes > doubleAVG | table orig_host Last5Minutes Average doubleAVG

sonicZ · ‎06-04-2013

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" 
source="/app/logs/vipservices/vipservices.log" earliest=-5m 
| stats count as Last5Minutes by host
| join host, orig_host 
[ search index=summary_vip  orig_host=ship*be* OR orig_host=van*be* OP="Validate" 
source="VIP Operations by Host Summary Index Search 5 Min" earliest=-15m latest=-5m 
| bucket span=5m _time  
| stats count by orig_host  
| stats avg(count) as Average by orig_host ] 
| eval doubleAVG=(2*Average) 
| where Last5Minutes > doubleAVG
| table orig_host Last5Minutes Average doubleAVG

sonicZ · ‎06-04-2013

Hi Lisa,
So based on your answer i think i am getting close...i already have a previous saved search gathering some validate operations in a summary index.
The problem is i cant do a join on orig_host to host because the summary index stores hosts as orig_host and comparing to regular vip index uses host, know any workarounds?

sonicZ · ‎06-03-2013

Lisa i tried using the subsearch you posted above, lowering the
"earliest=-1h" returns average values on the 8 hosts are around 10k average per host
"earliest=-4h" returns average values on the 8 hosts are around 40-70k average per host
"earliest=-6h" returns average values on the 8 hosts are around 70-90k average per host

The earliest =-30d would take way too long to finish.
If i just want to run it every 5 minutes should i make earliest=-10m latest=-5m how would i make the "where Last5Minutes > Average" only alert if 200% of average is reached?

sonicZ · ‎06-03-2013

Lisa, this definitely returns results, 8k or so within -5m

lguinn2 · ‎05-31-2013

What does this return

index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" source="/app/logs/vipservices/vipservices.log" earliest=-5m

and note that I have updated my answer above!

sonicZ · ‎05-31-2013

Lisa, thanks could not get the above search to return results, i tried lowering the earliest to earliest=-1h but still not getting results with even the subsearch.

I'll try again monday.

sonicZ · ‎05-31-2013

index="vip" host=ship*be* OR host=van*be* operation=Validate source="/app/logs/vipservices/vipservices.log" | timechart count span=1m | streamstats window=20 avg(count) as avgCount | fields _time avgCount

Or

index="vip" host=ship*be* OR host=van*be* operation=Validate source="/app/logs/vipservices/vipservices.log" | timechart span=1m avg(count) as avgcount |  bucket _time span=1m
| stats count by _time
| stats avg(count) as AverageCount | streamstats avg(AverageCount) as Strm_AverageCount

Getting the averages, but failing to compare to previous values over time.

calculate avg value over time - alert if 200% increase

.conf24 | Day 0

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

Troubleshooting the OpenTelemetry Collector