I have a report that returns
method Avg(timing) perc90(timing)
that I would like to create as a baseline each week.
index=some_index sourcetype=some_perf method="Oracle*" | stats avg(timing) as Average perc90(timing) as "90th Perc" perc95(timing) as "95th Perc" max(timing) as MAX count by method
Then I would like to create an alert that runs say every half hour or hour that whenever the avg or per90 for a particular method deviates from the base by say 20% I would get a alert that we might have a problem in the database.
I was thinking I would create a report of last weeks results as a lookup table and then do comparisons between each, but I'm not sure how.
we have several customers that depend on the performance of our application and I'd like to be proactive if there is a sudden slowdown in the database
Try something like this
Search 1: Runs weekly and updates your lookup table to store weekly (last week I'm assuming) baseline of the average and perc90 of the timing field.
index=some_index sourcetype=some_perf method="Oracle*" earliest=-1w@w latest=@w| stats avg(timing) as LastWeekAverage perc90(timing) as LastWeekPercentile90 by method
| outputlookup method_timing_baseline.csv
Cron: 55 00 * * 1
(every monday 00:55AM)
Now you can run your alert search reference this lookup to compare and alert.
index=some_index sourcetype=some_perf method="Oracle*" earliest=-35m@m latest=-5m@m | stats avg(timing) as Average perc90(timing) as "90th Perc" perc95(timing) as "95th Perc" max(timing) as MAX count by method | lookup method_timing_baseline.csv method OUTPUT LastWeekAverage LastWeekPercentile90 | where Average >1.2*LastWeekAverage OR '90th Perc'>1.2*LastWeekPercentile90
Cron : 5,35 * * * *
(every 30 mins on 5 and 35 minute of hour)
Try something like this
Search 1: Runs weekly and updates your lookup table to store weekly (last week I'm assuming) baseline of the average and perc90 of the timing field.
index=some_index sourcetype=some_perf method="Oracle*" earliest=-1w@w latest=@w| stats avg(timing) as LastWeekAverage perc90(timing) as LastWeekPercentile90 by method
| outputlookup method_timing_baseline.csv
Cron: 55 00 * * 1
(every monday 00:55AM)
Now you can run your alert search reference this lookup to compare and alert.
index=some_index sourcetype=some_perf method="Oracle*" earliest=-35m@m latest=-5m@m | stats avg(timing) as Average perc90(timing) as "90th Perc" perc95(timing) as "95th Perc" max(timing) as MAX count by method | lookup method_timing_baseline.csv method OUTPUT LastWeekAverage LastWeekPercentile90 | where Average >1.2*LastWeekAverage OR '90th Perc'>1.2*LastWeekPercentile90
Cron : 5,35 * * * *
(every 30 mins on 5 and 35 minute of hour)
exactly what I was looking for as a starting point, thanks alot
Is your search given here supposed to get Average and 90th Percentile for entire week or just last week same hour as current hour today?
my plan was to get the average for last week, because performance should be comparable within certain parameters +/- 10-or-20%. Really I'm looking for an alert when things have gone real bad and we'll start missing our SLAs