Hi,
I am trying to track a value on a backend server if a certain operation spikes to greater then 200% of the average value per 5 minutes, not sure how to do the alert part unless i enter a static value like this, and alert on the eval "high" value.
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" source="/app/logs/vipservices/vipservices.log" earliest=-5m | timechart span=5m count by host | eval BE_spike = if( count > 2000, "high", "normal")
what's the best way to schedule an alert if the OPERATION=Validate avg spikes higher then 200% of the previous values over time?
Try this:
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-5m
| stats count as Last5Minutes by host
| join host [ search index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-30d latest=-5m
| bucket span=5m _time
| stats count by host
| stats avg(count) as Average by host ]
| where Last5Minutes > Average
| table host Last5Minutes Average
And set the alert to trigger when the number of results is greater than zero.
Test it by removing the where command. Also, I updated this after I realized that the original (using timechart
) wasn't working properly.
There is video from a presentation by Jesse Trucks at a recent Splunk Live which covers just about this exact same topic. Watch it at https://vimeo.com/66779015
Try this:
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-5m
| stats count as Last5Minutes by host
| join host [ search index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-30d latest=-5m
| bucket span=5m _time
| stats count by host
| stats avg(count) as Average by host ]
| where Last5Minutes > Average
| table host Last5Minutes Average
And set the alert to trigger when the number of results is greater than zero.
Test it by removing the where command. Also, I updated this after I realized that the original (using timechart
) wasn't working properly.
I have another request on this answer, what if i want to do the same query but compare the last 5 minutes vs the last 12 / 24 hours? I am messing around with spans and dividing the avg(count) ..Math is hard 🙂
Awesome this works, thanks again
all I did was use eval
to create a new variable called orig_host
in the first search. You could also use rename
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-5m
| stats count as Last5Minutes by host
| eval orig_host = host
| join orig_host
[ search index=summary_vip orig_host=ship*be* OR orig_host=van*be* OP="Validate"
source="VIP Operations by Host Summary Index Search 5 Min" earliest=-15m latest=-5m
| bucket span=5m _time
| stats count by orig_host
| stats avg(count) as Average by orig_host ]
| eval doubleAVG=(2*Average)
| where Last5Minutes > doubleAVG
| table orig_host Last5Minutes Average doubleAVG
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate" source="/app/logs/vipservices/vipservices.log" earliest=-5m | stats count as Last5Minutes by host | join host, orig_host [ search index=summary_vip orig_host=ship*be* OR orig_host=van*be* OP="Validate" source="VIP Operations by Host Summary Index Search 5 Min" earliest=-15m latest=-5m | bucket span=5m _time | stats count by orig_host | stats avg(count) as Average by orig_host ] | eval doubleAVG=(2*Average) | where Last5Minutes > doubleAVG | table orig_host Last5Minutes Average doubleAVG
Hi Lisa,
So based on your answer i think i am getting close...i already have a previous saved search gathering some validate operations in a summary index.
The problem is i cant do a join on orig_host to host because the summary index stores hosts as orig_host and comparing to regular vip index uses host, know any workarounds?
Lisa i tried using the subsearch you posted above, lowering the
"earliest=-1h" returns average values on the 8 hosts are around 10k average per host
"earliest=-4h" returns average values on the 8 hosts are around 40-70k average per host
"earliest=-6h" returns average values on the 8 hosts are around 70-90k average per host
The earliest =-30d would take way too long to finish.
If i just want to run it every 5 minutes should i make earliest=-10m latest=-5m how would i make the "where Last5Minutes > Average" only alert if 200% of average is reached?
Lisa, this definitely returns results, 8k or so within -5m
What does this return
index="vip" host=ship*be* OR host=van*be* OPERATION="Validate"
source="/app/logs/vipservices/vipservices.log" earliest=-5m
and note that I have updated my answer above!
Lisa, thanks could not get the above search to return results, i tried lowering the earliest to earliest=-1h but still not getting results with even the subsearch.
I'll try again monday.
index="vip" host=ship*be* OR host=van*be* operation=Validate source="/app/logs/vipservices/vipservices.log" | timechart count span=1m | streamstats window=20 avg(count) as avgCount | fields _time avgCount
Or
index="vip" host=ship*be* OR host=van*be* operation=Validate source="/app/logs/vipservices/vipservices.log" | timechart span=1m avg(count) as avgcount | bucket _time span=1m | stats count by _time | stats avg(count) as AverageCount | streamstats avg(AverageCount) as Strm_AverageCount
Getting the averages, but failing to compare to previous values over time.