Splunk Search

Timechart with Timewrap and upper and lower bounds

Explorer

Hello,

I'm looking to create a query that is a timechart that timewraps every week, for x number of weeks, showing the count of a given field on a variable span within a given set of time on a certain day of the week with upper and lower bounds as overlay lines that are averaged for that hour across all the weeks in the time chart.

I know that sounds super generic, so for example, the chart could say show me the
-average httpresponsetime every 5 minutes
-from 2-3pm on every Tuesday
-timewrapped weekly over the last 10 weeks
-with upper and lower bounds, say 25th and 75th percentile overlays, where the overlays are the average response time for that hour averaged out across all weeks, to check trends over the entire period

Here is what I have come up with so far, using the eventgen app to generate test data:

index=main /booking 
| eval hour=tonumber(strftime(_time,"%H")) 
| eval year=tonumber(strftime(_time,"%Y")) 
| eval month=tonumber(strftime(_time,"%m")) 
| eval dayOfWeek=strftime(_time, "%A") 
| where (and here can day day and hour to whatever)
| timechart avg(http_response_time) span=1hour
| timewrap 1day series=relative
| append 
    [ search index=main /booking 
    | fields http_response_time 
    | timechart avg(http_response_time) as SecondAverage 
    stdev(http_response_time) as StandardDeviation
    perc75(http_response_time) as upper_control_limit
    perc25(http_response_time) as lower_control_limit span=1d
    | eval lowerBound=(SecondAverage-StandardDeviation), upperBound=(SecondAverage+StandardDeviation) 
    | fields _time lowerBound upperBound upper_control_limit lower_control_limit SecondAverage]
| eval hour=strftime(_time,"%H") 
| table hour *

part of the issue with the above is the overlay displays next to the averages, like this:
https://imgur.com/a/YxregbJ

any help is appreciated,

Thanks!

0 Karma
1 Solution

SplunkTrust
SplunkTrust

I've built this exact solution and it's been in production for over a year. There's a ton of complexities and gotcha's here. Is this for a single entity or for multiple entities? If the ladder, i'd strongly suggest you push this data to a summary index for faster queries and better usability. As for your bounds, you will need to add some eval's which calculate the offset from the baseline. I will share my SPL tomorrow

View solution in original post

SplunkTrust
SplunkTrust

I've built this exact solution and it's been in production for over a year. There's a ton of complexities and gotcha's here. Is this for a single entity or for multiple entities? If the ladder, i'd strongly suggest you push this data to a summary index for faster queries and better usability. As for your bounds, you will need to add some eval's which calculate the offset from the baseline. I will share my SPL tomorrow

View solution in original post

Explorer

@skoelpin thanks! were you able to find your spl to share?

0 Karma

SplunkTrust
SplunkTrust

Here ya go. Your base search is going to need a transformational command to pipe into this, I used stats as you can see.. This is spliting by entities represent customer and looks at a sliver in time in the past. Say if its noon on Tuesday, it will grab the last 5 noon's @ Tuesday in the last 5 weeks and average them together to create the baseline. You then create an upper and lower bounds by selecting a confidence interval (I used 90% below). I have this as a daily search which populates a summary index and have another scheduled search which runs every 10 minutes that overlays this with the actual counts for short feedback loops to measure accuracy. This will also alert when the actual count goes below the lowest threshold. Much faster time to value in ITSI

index=....
| stats count by customer, _time 
| eval w=case( (_time>relative_time(now(), "+1d@d-5w-30m") AND _time<=relative_time(now(), "+1d@d-5w+1d+30m")), 5, (_time>relative_time(now(), "+1d@d-4w-30m") AND _time<=relative_time(now(), "+1d@d-4w+1d+30m")), 4, (_time>relative_time(now(), "+1d@d-3w-30m") AND _time<=relative_time(now(), "+1d@d-3w+1d+30m")), 3, (_time>relative_time(now(), "+1d@d-2w-30m") AND _time<=relative_time(now(), "+1d@d-2w+1d+30m")), 2, (_time>relative_time(now(), "+1d@d-1w-30m") AND _time<=relative_time(now(), "+1d@d-1w+1d+30m")), 1) 
| eval shift=case(isnotnull(w),"+"+w+"w-30m,+"+w+"w-20m,+"+w+"w-10m,+"+w+"w-0m,+"+w+"w+10m,+"+w+"w+20m,+"+w+"w+30m,") 
| where isnotnull(shift) 
| makemv delim="," shift 
| mvexpand shift 
| eval time=relative_time(_time,shift) 
| eventstats avg(count) AS pred by time, customer 
| eval upper=if(count>pred,count,pred) 
| eval lower=if(count<pred,count,pred) 
| eval lower=if(lower=0,"",lower) 
| eventstats avg(count) AS pred, stdev(count) as pred_stdev, by time, customer 
| eval upper=if(upper>pred+1*pred_stdev,pred_stdev*0.5+pred,upper) 
| eval lower=if(lower <pred-1*pred_stdev,pred_stdev*0.5+pred, lower) 
| stats avg(count) AS pred, stdev(upper) AS ustdev, stdev(lower) AS lstdev stdev(count) as stdev by time, customer 
| eval low=pred-lstdev*(sqrt(1/(1-90/100))) 
| eval low=if(low<0, 1, low) 
| eval high=pred+ustdev*(sqrt(1/(1-90/100))) 
| eval _time=time 
| timechart span=10m useother=false limit=0 cont=false min(pred) as pred , min(low) as low, min(high) as high by customer 
| makecontinuous _time 
0 Karma

SplunkTrust
SplunkTrust

@gabenav11 did this answer your question? If so can you please accept it? It took a decent amount of time to grab this SPL, sanitize it, and post it..

0 Karma

Explorer

yes, thank you very much! I clicked accept on the answer you gave? but i cannot click accept on a response? do you see that it says accept?

0 Karma

SplunkTrust
SplunkTrust

Hey @ppablo_splunk can you help out @gabenav11 ? He's unable to accept the answer

0 Karma

Explorer

oh there we go i accepted now, i thought that "[checkmark] accepted" meant that it was accepted but actually that meant that i had not yet accepted and "[checkmark] unaccept" means it's accepted lol