Getting Data In

pulling historical data for a complex scenario

rakeshwest
Engager

I have a requirement to compare historical data and calculate standard deviation. Calculation of standard deviation was pretty clear on splunk site by using function stats stdev(number_of_data_records). But I am wondering how do we pull the data. The requirement is something like I get the data for a configured time (say configured time is 10min) and compare with the data 1 week back on the same day same time and also how many weeks it needs to go back should be configurable (say we pull out 2 weeks or 5 weeks, this should be configurable). Any ideas how can we accomplish this pulling the records in splunk

0 Karma

DalJeanis
Legend

The first thing you do when talking about stdev is determine what "thing" you are trying to understand the variability of.

One possible thing you can be measuring is "across six weeks, how variable are the values of the 'configured time'?".

Another possible thing you can be measuring is "across each week (or day) of the last six weeks, how variable are the values of the 'configured time'?"

Another possible thing you can be measuring is "across each week (or day) of the last six weeks, how variable is the AVERAGE value of the 'configured time'?"

Those are different questions.

Once you know the question you are asking, you then create an extract query that creates one record per each reading of the thing you are trying to measure the variability of.

So, in questions 1 and 2 above, you create one record per configuration event, with the value of the configured time. In question 3, you create one record for each day, with the average value of configured time.

(your query that selects each event) earliest=-6w@d latest=@d
| stats avg(configured_time) as ConfTime_Avg, stdev(configured_time) as ConfTime_Stdev

(your query that selects each event) earliest=-6w@d latest=@d
| bin _time span=1w
| stats avg(configured_time) as ConfTime_Avg, stdev(configured_time) as ConfTime_Stdev by _time

(your query that selects each event) earliest=-6w@d latest=@d
| bin _time span=1w
| stats avg(configured_time) as ConfTime_Avg by _time
| eventstats avg(ConfTime_Avg) as ConfTime_AvgAvg, stdev(ConfTime_Avg) as ConfTime_AvgStdev

Going back to your question, you need to understand what they are trying to achieve with the comparison.

Let's suppose they want to know, "How does the average "configured time" of this Friday morning (before 9:00 AM UTC) compare to the average "configured times" of all Friday mornings for the past X weeks, in terms of the standard deviation for those X weeks of the average times of those periods?" In other words, how average (or conversely, how distinctive) was the average time this week?

This is probably not the most elegant way to code it, but it answers the question above.

(your query that selects each event) earliest=-6w@d latest=@d
| eval dayofweek=strftime(_time,"%A %H:%M")
| where (dayofweek>="Friday 00:00") AND  (dayofweek<"Friday 09:00") 
| bin _time span=1d
| stats avg(configured_time) as ConfTime_Avg by _time
| stats latest(_time) as _time, latest(ConfTime_Avg) as ConfTime_Avg, avg(ConfTime_Avg) as ConfTime_AvgAvg, stdev(ConfTime_Avg) as ConfTime_AvgStdev
| eval ZScore=(ConfTime_Avg-ConfTime_AvgAvg)/ConfTime_AvgStdev

So, to answer your question again. First, you create individual events for whatever you are trying to determine the variability of, for the entire period that you are going to be comparing to. Then, you compare the individual item to that average and standard deviation, to determine how much it deviates from the average.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2026-2027 SplunkTrust is officially open. If ...