Archive

pulling historical data for a complex scenario

Engager

I have a requirement to compare historical data and calculate standard deviation. Calculation of standard deviation was pretty clear on splunk site by using function stats stdev(number_of_data_records). But I am wondering how do we pull the data. The requirement is something like I get the data for a configured time (say configured time is 10min) and compare with the data 1 week back on the same day same time and also how many weeks it needs to go back should be configurable (say we pull out 2 weeks or 5 weeks, this should be configurable). Any ideas how can we accomplish this pulling the records in splunk

0 Karma

SplunkTrust
SplunkTrust

The first thing you do when talking about stdev is determine what "thing" you are trying to understand the variability of.

One possible thing you can be measuring is "across six weeks, how variable are the values of the 'configured time'?".

Another possible thing you can be measuring is "across each week (or day) of the last six weeks, how variable are the values of the 'configured time'?"

Another possible thing you can be measuring is "across each week (or day) of the last six weeks, how variable is the AVERAGE value of the 'configured time'?"

Those are different questions.

Once you know the question you are asking, you then create an extract query that creates one record per each reading of the thing you are trying to measure the variability of.

So, in questions 1 and 2 above, you create one record per configuration event, with the value of the configured time. In question 3, you create one record for each day, with the average value of configured time.

(your query that selects each event) earliest=-6w@d latest=@d
| stats avg(configured_time) as ConfTime_Avg, stdev(configured_time) as ConfTime_Stdev

(your query that selects each event) earliest=-6w@d latest=@d
| bin _time span=1w
| stats avg(configured_time) as ConfTime_Avg, stdev(configured_time) as ConfTime_Stdev by _time

(your query that selects each event) earliest=-6w@d latest=@d
| bin _time span=1w
| stats avg(configured_time) as ConfTime_Avg by _time
| eventstats avg(ConfTime_Avg) as ConfTime_AvgAvg, stdev(ConfTime_Avg) as ConfTime_AvgStdev

Going back to your question, you need to understand what they are trying to achieve with the comparison.

Let's suppose they want to know, "How does the average "configured time" of this Friday morning (before 9:00 AM UTC) compare to the average "configured times" of all Friday mornings for the past X weeks, in terms of the standard deviation for those X weeks of the average times of those periods?" In other words, how average (or conversely, how distinctive) was the average time this week?

This is probably not the most elegant way to code it, but it answers the question above.

(your query that selects each event) earliest=-6w@d latest=@d
| eval dayofweek=strftime(_time,"%A %H:%M")
| where (dayofweek>="Friday 00:00") AND  (dayofweek<"Friday 09:00") 
| bin _time span=1d
| stats avg(configured_time) as ConfTime_Avg by _time
| stats latest(_time) as _time, latest(ConfTime_Avg) as ConfTime_Avg, avg(ConfTime_Avg) as ConfTime_AvgAvg, stdev(ConfTime_Avg) as ConfTime_AvgStdev
| eval ZScore=(ConfTime_Avg-ConfTime_AvgAvg)/ConfTime_AvgStdev

So, to answer your question again. First, you create individual events for whatever you are trying to determine the variability of, for the entire period that you are going to be comparing to. Then, you compare the individual item to that average and standard deviation, to determine how much it deviates from the average.

0 Karma