Splunk Search

How would I dynamically adjust earliest/latest of subsearches based on main search?

daviduslan
Path Finder

Hello,

I have a query that does 5 searches. A recent search, and four sub searches on the same exact data from 1-4 weeks ago, then compares the 4 week avg to the recent data. The subsearches join on weekday and hour so need to be exactly 7 days prior to the original earliest/latest. Here's a hard-coded example:

index="foo" sourcetype="bar" earliest="11/05/2014:12:00:00" latest="11/11/2014:12:00:00"
| bucket _time span=1h  
| stats count as current_joins by _time   
| eval joiner = strftime(_time,"%w:%H")   
| join joiner [ search index="foo" sourcetype="bar" earliest="10/29/2014:12:00:00" latest="11/04/2014:12:00:00"
|lookup siteid_lookup siteid as site OUTPUT name    
| fields _time site 
| bucket _time span=1h   
| stats count as week_ago_joins by _time    
| eval joiner = strftime(_time,"%w:%H")    
| fields joiner, week_ago_joins ]  
| join joiner [ search index="foo" sourcetype="bar" earliest="10/22/2014:12:00:00" latest="10/28/2014:12:00:00"
|lookup siteid_lookup siteid as site OUTPUT name 
| fields _time site 
| bucket _time span=1h   
| stats count as two_week_ago_joins by _time    
| eval joiner = strftime(_time,"%w:%H")    
| fields joiner, two_week_ago_joins ]  
| join joiner [ search index="foo" sourcetype="bar" earliest="10/15/2014:12:00:00" latest="10/21/2014:12:00:00"
|lookup siteid_lookup siteid as site OUTPUT name 
| fields _time site 
| bucket _time span=1h   
| stats count as three_week_ago_joins by _time    
| eval joiner = strftime(_time,"%w:%H")    
| fields joiner, three_week_ago_joins ]  
| join joiner [ search index="foo" sourcetype="bar" earliest="10/08/2014:12:00:00" latest="10/14/2014:12:00:00"
|lookup siteid_lookup siteid as site OUTPUT name 
| fields _time site 
| bucket _time span=1h   
| stats count as four_week_ago_joins by _time    
| eval joiner = strftime(_time,"%w:%H")    
| fields joiner, four_week_ago_joins ]  
| eval four_week_combined_avg = round((four_week_ago_joins+three_week_ago_joins+two_week_ago_joins+week_ago_joins)/4,2)
| eval change_pct = round(((current_joins-four_week_combined_avg)/four_week_combined_avg)*100,2)  

I've found this analysis to be very useful in analyzing business metrics, especially around release management. I'm trying to figure out a way to create a dashboard where you only need to select an earliest and/or latest time, then the subsearches will adjust accordingly. Any advice would be greatly appreciated, thanks!

0 Karma
1 Solution

daviduslan
Path Finder

Thanks all for the help/patience!

I worked out a solution with a coworker using gentimes. It goes something like this (tokens included to show how it interacts with the dashboard I created):

Dash options look like this: http://puu.sh/dAqeR/a22c568323.png (search truncated)

index="foo" sourcetype="bar" earliest="$earliest_input$" latest="$latest_input$"
| bucket _time span=1h
| stats count as current_joins by _time
| eval joiner = strftime(_time,"%w:%H")
| join joiner [ search index="foo" sourcetype="bar"
[|gentimes start="$earliest_input$" end="$latest_input$" increment="$increment$" | eval earliest=starttime-(86400*7) | eval latest=endtime-(86400*7)| table earliest,latest | format "" "" "" "" "" ""]
.....
etc.

Only downside is that the increment needs to be the exact amount of time between the start and end for this to be accurate. Fwiw, I ran the report on 5 weeks of data using this method, got the desired results, and the search took 47sec with diskUsage @ 1196032.

View solution in original post

daviduslan
Path Finder

Thanks all for the help/patience!

I worked out a solution with a coworker using gentimes. It goes something like this (tokens included to show how it interacts with the dashboard I created):

Dash options look like this: http://puu.sh/dAqeR/a22c568323.png (search truncated)

index="foo" sourcetype="bar" earliest="$earliest_input$" latest="$latest_input$"
| bucket _time span=1h
| stats count as current_joins by _time
| eval joiner = strftime(_time,"%w:%H")
| join joiner [ search index="foo" sourcetype="bar"
[|gentimes start="$earliest_input$" end="$latest_input$" increment="$increment$" | eval earliest=starttime-(86400*7) | eval latest=endtime-(86400*7)| table earliest,latest | format "" "" "" "" "" ""]
.....
etc.

Only downside is that the increment needs to be the exact amount of time between the start and end for this to be accurate. Fwiw, I ran the report on 5 weeks of data using this method, got the desired results, and the search took 47sec with diskUsage @ 1196032.

lguinn2
Legend

Wow - this search might take forever to run! I'd do it this way:

index="foo" sourcetype="bar" earliest=-5w@d latest=@w@d
| bucket _time span=1h  
| eval week = case(_time < relative_time(now(),"-4w@d"),4,
                   _time < relative_time(now(),"-3w@d"),3,
                   _time < relative_time(now(),"-2w@d"),2,
                   _time < relative_time(now(),"-1w@d"),1,
                   1==1,0)
| eval dow_hour = strftime(_time,"%w:%H")
| stats count(eval(week==0)) as current count(eval(week==1)) as week1 count(eval(week==2)) as week2 count(eval(week==3)) as week3 count(eval(week==4)) as week4 by dow_hour
| eval four_week_combined_avg = round((week1+week2+week4+week4)/4,2)
| eval change_pct = round(((current-four_week_combined_avg)/four_week_combined_avg)*100,2)

I took out the lookup, since you weren't actually using it anyway.

Now to the actual problem - you want to allow someone to choose the start date for the analysis. The only good way that I know to do this is to use the timerange picker in the dashboard. The problem is that you must trust the user to pick a starting (earliest) time that is at least 5 weeks ago, or else the results won't make much sense. But given that, your search could work like this:

 index="foo" sourcetype="bar"
| addinfo
| bucket _time span=1h  
| eval week = case(_time < relative_time(info_min_time,"+1w@d"),4,
                   _time < relative_time(info_min_time,"+2w@d"),3,
                   _time < relative_time(info_min_time,"+3w@d"),2,
                   _time < relative_time(info_min_time,"+4w@d"),1,
                   _time < relative_time(info_min_time,"+5w@d"),0,
                   1==1,-1)
| where week > -1
| eval dow_hour = strftime(_time,"%w:%H")
| stats count(eval(week==0)) as current count(eval(week==1)) as week1 count(eval(week==2)) as week2 count(eval(week==3)) as week3 count(eval(week==4)) as week4 by dow_hour
| eval four_week_combined_avg = round((week1+week2+week4+week4)/4,2)
| eval change_pct = round(((current-four_week_combined_avg)/four_week_combined_avg)*100,2)

martin_mueller
SplunkTrust
SplunkTrust

Here's a different thought: Create a time-based lookup that tells you when each version was introduced, that way each event gets tagged with the correct version. Then you can group by that easily, no need to express the version switches in the search.

daviduslan
Path Finder

Thanks so much for your response! It's very important for this report that I be able to get very granular with the earliest or latest times. The reason behind this is so that I can measure the impact of a code release, which I know the exact time of, on business metrics.

I tried implementing this and I don't think it's quite what I'm looking for - http://puu.sh/dzWrg/0a31d4edf2.png. Ideally, I'd be able to set an earliest and/or latest time with the picker and have that populate the earliest and/or latest of "current", then have the previous four weeks adjust accordingly, regardless of how long ago the set date is.

Thanks for giving this your attention!

0 Karma

daviduslan
Path Finder

Additionally, your solution is taking a 5 week data set then pairing it down instead of having 5 separate, significantly smaller searches. You're right that it would take a long time to run, but I think my starting point, while far more verbose, is actually a bit more efficient for this task.

0 Karma

lguinn2
Legend

A join (or any sort of subsearch) in Splunk is not very efficient. Plus, there are significant limits to subsearches that may affect results over very large datasets.

Try it both ways and look at the Search Job Inspector. In a production environment, I would be extremely surprised if your search was more efficient.

0 Karma

daviduslan
Path Finder

You may be right. Unfortunately, I haven't found a way besides the join/subsearching method to get exactly what I'm looking for. I'm not trying to compare a full week to a full week every time and join/subsearch allows me to get more granular.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

This feels like a use case for timewrap. https://apps.splunk.com/app/1645

lguinn2
Legend

Good thought Martin!

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...