I am looking for suggestions as to how best to implement an alerting request made by my users.
Summary
A query is run to count the number of events. The time weighted difference (in percentage) between one period of the next will be used to trigger the alert if the threshold is met.
Query
I have a query which I am already using on an existing dashboard. I am using TSTATS to count events, then a REX to group the events based on the first 3 characters for a field (DRA).
| tstats prestats=t count where index="foo" sourcetype="bar*" DRA=C* by DRA, _time
| rex field=DRA "^(?<pfx>\D\S{2}).*"
| timechart span=5m count by pfx useother=f limit=0 usenull=f
The groups and the number of these 'groups' will vary, but the result will be similar to the below :
_time | C27 | C31 | C33 |
2022-10-12 13:00:00 | 116 | 2 | 70 |
2022-10-12 13:05:00 | 287 | 3 | 20 |
2022-10-12 13:10:00 | 383 | 6 | 45 |
2022-10-12 13:15:00 | 259 | 7 | 41 |
I suspect the maximum number of DRA codes that we will see will be 25, although I can break this up into different queries and play with some timing and priorities with the running of the searches.
Goal
The goal is to alert when any percentage changes from one period to the next by more than a set percentage. So, for example, in the above, I might want an alert at 13:05 that 'C33' had changed by ~72% from the previous period.
I Have Tried
Using a mix of streamstats, eval and trendline statements, have the following which will alert for a single 'C' code.
| tstats count as count where index="foo" sourcetype="bar" DRA=C27* by _time span=5m
| timechart sum(count) as total_count span=5min
| streamstats current=f window=1 last(total_count) as prev_count
| eval percentage_errors=round(abs(prev_count/total_count)*100,1)
| fillnull value=000
| trendline wma5(percentage_errors) AS trend_percentage
| eval trend_percentage=round(trend_percentage,1)
| fillnull value=0 trend_percentage
| table _time, total_count, prev_count, percentage_errors, trend_percentage
Problems and Concerns
Any suggestions or comments on my line of thinking are appreciated.
| untable _time pfx count
| streamstats window=1 current=f global=f sum(count) as previous by pfx
| eval diff = abs(count - previous)
| eval percentage_change = round(100 * diff / previous, 2)
| untable _time pfx count
| streamstats window=1 current=f global=f sum(count) as previous by pfx
| eval diff = abs(count - previous)
| eval percentage_change = round(100 * diff / previous, 2)
@ITWhisperer - Brilliant, untable. I am not sure I fully get it and need to read into it more, but I am surprised that I haven't come across this at least a dozen times by now. Thanks.
I have added in the trendline to get trending on the percentage_change, and after some rearranging of the SPL, I have come up with the following which appears to work correctly:
| tstats prestats=t count where index="foo" sourcetype="bar" DRA=C* by DRA, _time
| rex field=DRA "^(?<pfx>\D\S{2}).*"
| timechart span=5m count by pfx useother=f limit=0 usenull=f
| trendline wma5(percentage_change) AS trend_percentage
| eval trend_percentage=round(trend_percentage,1)
| untable _time pfx count
| streamstats window=1 current=f global=f sum(count) as previous by pfx
| eval diff = abs(count - previous)
| eval percentage_change = round(100 * diff / previous, 2)
| table _time, pfx, count, previous, diff, percentage_change
Now to add some final touches -- I may add a streamstats count as marker by pfx to be sure I capture the last/latest event (however this may miss a pfx if it exists in one time period and not another, but I will play with this to see what happens).
Thanks.