I'm trying to create some monitoring alerts for when errors increase greater than a certain amount compared to their usual amount. I've got it working to compare yesterday to today, but I'd like to compare the daily average of a certain period to today for more accurate results. This is proving to be a little too tricky for me and any help would be greatly appreciated! Here is my current search:
index="reseller" sourcetype="oneclick_error_log" Sitename="*" | bucket _time span="d" | stats count AS oneclick_errors by Sitename, _time | delta oneclick_errors as change | eval change_percent=change/(oneclick_errors-change)*100 | sort Sitename | where _time>=relative_time(now(),"-d") AND change_percent > 25
Note: I have the alert set to run at midnight so there is a complete dataset for comparison.
This is the search I ended up using to solve my problem (there was too much data to use a subsearch/eval). The change_percent still needs some work, but I've got my data how I want it.
index="reseller" sourcetype="oneclick_error_log" Sitename="*" earliest=-8d | stats count as weekly_total_errors, count(eval(if( _time>relative_time(now(),"-d"),"x",NULL))) as todays_errors by Sitename | eval weekly_total_errors = weekly_total_errors - todays_errors | eval weekly_avg = weekly_total_errors/7 | eval change = todays_errors-weekly_avg | eval change_percent = (change/weekly_avg)*100
Another option
|multisearch [search index="reseller" sourcetype="oneclick_error_log" Sitename="*" earliest=-8d@d latest=@d | eval type="weeklyAvg" ][search index="reseller" sourcetype="oneclick_error_log" Sitename="*" earliest=@d | eval type="today" ] |bucket span=1d _time| chart count over loggingAppId by type | eval weeklyAvg=round(weeklyAvg/7,2) | eval change_percent=round((today-weeklyAvg)*100/weeklyAvg,2) | where change_percent> 25
Unfortunately the limit on data subsearches can process makes this a solution I can't use. Thanks so much for the response though!
There is also an app that allows you to do this easily.
I've played with timewrap - it's super awesome and powerful, but I couldn't figure how to configure it to compare larger windows of time to smaller windows (last week vs today). If I wanted to compare the results today to the same day last week, it would be perfect.
This is the search I ended up using to solve my problem (there was too much data to use a subsearch/eval). The change_percent still needs some work, but I've got my data how I want it.
index="reseller" sourcetype="oneclick_error_log" Sitename="*" earliest=-8d | stats count as weekly_total_errors, count(eval(if( _time>relative_time(now(),"-d"),"x",NULL))) as todays_errors by Sitename | eval weekly_total_errors = weekly_total_errors - todays_errors | eval weekly_avg = weekly_total_errors/7 | eval change = todays_errors-weekly_avg | eval change_percent = (change/weekly_avg)*100
How do you refine the WHERE clause so that it not only looks for "change_percent > 25" but also "weeklyAvg > 100" for example? I've tried "where change_percent > 25 and weeklyAvg > 100" in my example but what happens is that during the first parsing phase, I see the results of the query (before the WHERE statement) being populated in the table from the stats command. But as soon as it gets to the WHERE statement, the long list of entries gets reduced to just a few (where a lot more is clearly expected).
One way, but perhaps not the best, is by using a subsearch and eval
. Here's an example:
earliest=@d sourcetype=access_combined
| eval
[ search earliest=-1d@d latest=@d sourcetype=access_combined
| stats avg(bytes) as avg_bytes
| return avg_bytes
]
| table _time, avg_bytes, bytes
In this example, the eval
command looks a little strange. But, remember, subsearches are a textual construct. So, by the time the subsearch finishes, the search command inside of [
and ]
will be textually replaced by the results of the subsearch - in this case avg_bytes=<some_number>
. This happens before the eval
even "sees it" - all eval
"sees" is | eval avg_bytes=1234567
This is probably not the best performing way of solving this problem and could be improved a by a summary index or an acceleration of the subsearch.
Turns out I'm working with way too much data for subsearches. I came up with a different solution that I'll post in a separate comment.