Hi,
I'm currently searching for a method that will help me alerting anomalies in historial event logs.
Let's say; i've a log source that generates events. In day 1, between 09:30 - 10:00 it generates total 105 events, in day 2, between 09:30 - 10:00 it generates 110 events so on and so forth...
I want to write a search which will work every 30 minutes and will find deviation (%50 for example) from average for last 2 days for the same 30 minutes period
Example:
At 10:00, search will look for past 2 days events (value 2 can be variable) and calculate average;
Yesterday, between 09:30 - 10:00; 105 events generated
Average is: (105+110)/2=107,5
If (current event count for today's 09:30 - 10:00 period) > 160, it will generate an alert since %50 deviation threshold is exceeded.
(Using version 6.2.1)
Any idea?
Thanks,
This may help as I cobbled it together (using it against our email index) using bits and pieces from posts in Splunk's answer form (and I believe some of lguinn's magic may be in here):
(Show where there is an increase in the number of events, the increase is a min of 80+% for previous hour and events > 10)
index=email _index_earliest=-h@h _index_latest=@h Subject="xxx" SenderAddress="xxx@xxx.edu"
| timechart span=1h partial=false count
| delta count as difference
| eval difference=coalesce(difference,0)
| eval percentDifference =round(abs(difference/(count - difference))*100)
| where (difference > 1 AND percentDifference > 100)
| where count > 10
This should work
earliest=-49h latest=-30m yoursearchcriteria
| eval endday2=relative_time(now(),"-49h+30m")
| eval startday1=relative_time(now(),"-25h")
| eval endday1=relative_time(now(),"-25h+30m")
| eval starttoday=relative_time(now(),"-1h")
| eval Day = case(_time <= endday2,"day2",
_time <= endday1 AND _time >= startday1,"day1",
_time >= starttoday,"today",
1==1,"eliminate")
| where Day!="eliminate"
| stats count by Day
| transpose
| eval avg=(day1+day2)/2
| where today > avg * 1.5
This seems like an ugly way to do what you want, but it was the first thing that crossed my mind. It isn't very flexible either.
Note that I have included a 30-minute lag in the calculations - without a lag, your numbers may not be accurate, as it often takes a minute or so for events to be detected on the remote server, forwarded, parsed and indexed.
You might want to look at the timewrap app - it's free and designed to help with these sorts of comparisons.
When I run it, it gives "Error in 'eval' command: The arguments to the 'if' function are invalid."
Should be case
not if
! Sorry, I have edited the answer...
@Iguinn
I'm trying to use machine learning toolkit. In which assistant , shall I use the code you have posted above?