Hello,
I'm looking to create a query that is a timechart that timewraps every week, for x number of weeks, showing the count of a given field on a variable span within a given set of time on a certain day of the week with upper and lower bounds as overlay lines that are averaged for that hour across all the weeks in the time chart.
I know that sounds super generic, so for example, the chart could say show me the
-average http_response_time every 5 minutes
-from 2-3pm on every Tuesday
-timewrapped weekly over the last 10 weeks
-with upper and lower bounds, say 25th and 75th percentile overlays, where the overlays are the average response time for that hour averaged out across all weeks, to check trends over the entire period
Here is what I have come up with so far, using the eventgen app to generate test data:
index=main /booking
| eval hour=tonumber(strftime(_time,"%H"))
| eval year=tonumber(strftime(_time,"%Y"))
| eval month=tonumber(strftime(_time,"%m"))
| eval dayOfWeek=strftime(_time, "%A")
| where (and here can day day and hour to whatever)
| timechart avg(http_response_time) span=1hour
| timewrap 1day series=relative
| append
[ search index=main /booking
| fields http_response_time
| timechart avg(http_response_time) as SecondAverage
stdev(http_response_time) as StandardDeviation
perc75(http_response_time) as upper_control_limit
perc25(http_response_time) as lower_control_limit span=1d
| eval lowerBound=(SecondAverage-StandardDeviation), upperBound=(SecondAverage+StandardDeviation)
| fields _time lowerBound upperBound upper_control_limit lower_control_limit SecondAverage]
| eval hour=strftime(_time,"%H")
| table hour *
part of the issue with the above is the overlay displays next to the averages, like this:
https://imgur.com/a/YxregbJ
any help is appreciated,
Thanks!
I've built this exact solution and it's been in production for over a year. There's a ton of complexities and gotcha's here. Is this for a single entity or for multiple entities? If the ladder, i'd strongly suggest you push this data to a summary index for faster queries and better usability. As for your bounds, you will need to add some eval's which calculate the offset from the baseline. I will share my SPL tomorrow
I've built this exact solution and it's been in production for over a year. There's a ton of complexities and gotcha's here. Is this for a single entity or for multiple entities? If the ladder, i'd strongly suggest you push this data to a summary index for faster queries and better usability. As for your bounds, you will need to add some eval's which calculate the offset from the baseline. I will share my SPL tomorrow
@skoelpin thanks! were you able to find your spl to share?
Here ya go. Your base search is going to need a transformational command to pipe into this, I used stats
as you can see.. This is spliting by entities represent customer
and looks at a sliver in time in the past. Say if its noon on Tuesday, it will grab the last 5 noon's @ Tuesday in the last 5 weeks and average them together to create the baseline. You then create an upper and lower bounds by selecting a confidence interval (I used 90% below). I have this as a daily search which populates a summary index and have another scheduled search which runs every 10 minutes that overlays this with the actual counts for short feedback loops to measure accuracy. This will also alert when the actual count goes below the lowest threshold. Much faster time to value in ITSI
index=....
| stats count by customer, _time
| eval w=case( (_time>relative_time(now(), "+1d@d-5w-30m") AND _time<=relative_time(now(), "+1d@d-5w+1d+30m")), 5, (_time>relative_time(now(), "+1d@d-4w-30m") AND _time<=relative_time(now(), "+1d@d-4w+1d+30m")), 4, (_time>relative_time(now(), "+1d@d-3w-30m") AND _time<=relative_time(now(), "+1d@d-3w+1d+30m")), 3, (_time>relative_time(now(), "+1d@d-2w-30m") AND _time<=relative_time(now(), "+1d@d-2w+1d+30m")), 2, (_time>relative_time(now(), "+1d@d-1w-30m") AND _time<=relative_time(now(), "+1d@d-1w+1d+30m")), 1)
| eval shift=case(isnotnull(w),"+"+w+"w-30m,+"+w+"w-20m,+"+w+"w-10m,+"+w+"w-0m,+"+w+"w+10m,+"+w+"w+20m,+"+w+"w+30m,")
| where isnotnull(shift)
| makemv delim="," shift
| mvexpand shift
| eval time=relative_time(_time,shift)
| eventstats avg(count) AS pred by time, customer
| eval upper=if(count>pred,count,pred)
| eval lower=if(count<pred,count,pred)
| eval lower=if(lower=0,"",lower)
| eventstats avg(count) AS pred, stdev(count) as pred_stdev, by time, customer
| eval upper=if(upper>pred+1*pred_stdev,pred_stdev*0.5+pred,upper)
| eval lower=if(lower <pred-1*pred_stdev,pred_stdev*0.5+pred, lower)
| stats avg(count) AS pred, stdev(upper) AS ustdev, stdev(lower) AS lstdev stdev(count) as stdev by time, customer
| eval low=pred-lstdev*(sqrt(1/(1-90/100)))
| eval low=if(low<0, 1, low)
| eval high=pred+ustdev*(sqrt(1/(1-90/100)))
| eval _time=time
| timechart span=10m useother=false limit=0 cont=false min(pred) as pred , min(low) as low, min(high) as high by customer
| makecontinuous _time
@gabenav11 did this answer your question? If so can you please accept it? It took a decent amount of time to grab this SPL, sanitize it, and post it..
yes, thank you very much! I clicked accept on the answer you gave? but i cannot click accept on a response? do you see that it says accept?
Hey @ppablo_splunk can you help out @gabenav11 ? He's unable to accept the answer
oh there we go i accepted now, i thought that "[checkmark] accepted" meant that it was accepted but actually that meant that i had not yet accepted and "[checkmark] unaccept" means it's accepted lol