Solved: Re: Building multisearch dynamically

w564432 · ‎09-28-2021

This is mostly just a curiosity, motivated by this post on how to compare a particular time interval across multiple larger time periods. Effectively the solution seems to be to generate a list of time intervals and run map subsearches on each entry.

When I have multiple time periods that I'd like to run stats on, I typically use a multisearch command followed by a chart, as follows:

| multisearch [ index=potato et=<et1> lt=<lt2> | eval series=1 ] 
[ index=potato et=<et2> lt=<lt2> |eval series=2 ]
.
.
. 
[ index=potato et=<etn> lt=<ltn> | eval series=n ] 

| timechart count by series

I suppose you could make it work by substituting the et's and lt's via subsearch, but it won't work if the number of time intervals, n, is also dynamically generated by some prior search.

I know you can use a number of different techniques, but they all have different drawbacks.

You could use map, which offers pretty much all the flexibility/dynamic-ness you need (I've abused it plenty of times doing things like map search=`searchstring($searchstring$)` ), but there are performance issues with this as subsearches can time out as it doesn't offer the same optimization as multisearch does when you just need to string multiple streams together.
You can just search the entire timerange and use some eval logic to filter out the time intervals you need, but isn't this suboptimal since you're searching more events than you need? Multisearch seems to be great at streaming multiple different time intervals together and I'd love to have that optimization without

At this point, would you just have to resort to REST to schedule searches? How would we tie the data together? I'm not very familiar with what is possible with REST as all of my experience is with just plain SPL.

In a word, how do we stream events across multiple, dynamically generated time intervals without running into subsearch limitations?

PickleRick · ‎09-28-2021

You can generate multiple ranges with a subsearch. For example:

index=_internal [| makeresults count=10 
| streamstats count as cnt 
| eval earliest=now()-cnt*60 
| eval latest=now()-cnt*60+15
| table earliest latest ] 
| timechart span=10s count

The only problem here is that you can't easily order the series. You probably get around it somehow but it's not that straightforward.

The upside is that it uses just one base search and one subsearch.

View solution in original post

w564432 · ‎10-01-2021

I meant to accept your post I replied to as solution, but accidentally hit my reply instead. Can I fix this?

PickleRick · ‎10-02-2021

You can click "not a solution" and click other post as a solution. But no worries 🙂

PickleRick · ‎09-28-2021

You can generate multiple ranges with a subsearch. For example:

index=_internal [| makeresults count=10 
| streamstats count as cnt 
| eval earliest=now()-cnt*60 
| eval latest=now()-cnt*60+15
| table earliest latest ] 
| timechart span=10s count

The only problem here is that you can't easily order the series. You probably get around it somehow but it's not that straightforward.

The upside is that it uses just one base search and one subsearch.

w564432 · ‎06-13-2023

Just wanted to come back to this in case anyone else reads this.

Although this solution works , it doesn't appear to be much faster than if you searched the entire timeframe between the absolute earliest and absolute latest.

For instance, if you were searching a year but only needed to sample a few sparse holidays, I believe the performance is closer to searching the whole year than running separate searches for the holidays only.

PickleRick · ‎06-14-2023

That is possible. You would get most impact if your generated sum of time segments caused splunk to ignore whole buckets vs. the whole time range. So obviously there would be much less I/O overhead, memory use, opened files and so on. With search like "(earliest=-10m latest=-9m) OR (earliest=-8m latest=-7m)" vs. "(earliest=-10m latest=-5m)"... I wouldn't expect much improvement. It also depends on the data and search itself so YMMV.

w564432 · ‎09-28-2021

That's wild. I always assumed earliest and latest were static parameters, and nothing in the documentation seems to suggest you can do something like (earliest=X AND latest=Y) OR (earliest=A AND latest=B).
Is there some general concept I am missing on how Splunk parses parameters which should make this obvious? Or is it just specific to this one thing?

Like you mentioned you lose the series labeling you get for free with multisearch, but you could patch this up with an eval although I'd imagine the performance suffers. But still this is a great, relatively simple solution!

PickleRick · ‎09-28-2021

To be honest, I was a bit surprised myself. Considering that _time is treated a bit differently during search I also assumed at first that you can have just one fixed range for your search. But then I thought, as my colleague used to say - try and see. So I tried and saw 🙂

The issue of number of searches might not seem important at first but if you have more of those ranges you might hit your server's limits.

How to build multisearch dynamically?

timechart

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Are you a member of the Splunk Community?

How to build multisearch dynamically?

timechart

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0