I have around 300 KPIs whose variation over time is needed to be monitored. The deviation and count of the KPI are not uniform across all the 300 KPIs. The requirement is that Splunk should set a threshold for each of the KPIs by itself. I'm aware that this capability is available in ITSI but I don't have the scope to use ITSI.
Is there a way to achieve that? Also, would like to know what's the best way to set visualization for such a huge number of KPIs.
@MousumiChowdhury, for adaptive Thresholding, I would say you would need two things:
Machine Learning Toolkit(https://splunkbase.splunk.com/app/2890/) for setting up Outlier/Statndard Deviation thresholds (you can start off with trivial statistical thresholds. (For example:
Hourly 2nd Standard Deviation for every hour of the week based on historical data from last 1-2 years etc).
2) Ample Historic Data (Which implies Summary Indexing/Accelerated Data Model for hitorical searches to return results fast)
While viewing KPIs in a single place you should determine whether you need to see all 300 at the same time or may be broken out by either Type of Service, Type of KPI, Type of Server etc. That way while you will have capability to monitor everything you will not load all of them at the same time. (I have not used ITSI, but I think even ITSI by default shows you 50 KPIs in a single place).
There is no scope to group the KPIs. I need to display only those top KPIs where there is a significant deviation. I can't figure out the search how to do that. My use case is like: I have to compare the count of my KPI at a certain hour of current day with the average count of that KPI over past 30 days or so for the same hour and calculate the deviation. I hope my use case is understandable.
@MousumiChowdhury, Summary indexing all 300 kpis may be the best option here.
It sounds like a combination of
relative_time and pushing the results to a summary index would do the trick.
You would need to a scheduled search that looks at the past 30 days to determine a normal baseline then have another scheduled search which will push the hourly counts into the summary index. You can than craft an alert query to determine a number of standard deviations from the mean