Splunk Search

Looking for way to explain why subsearches are so slow

bcronrath
Path Finder

I've seen it suggested before and definitely have witnessed myself that for searches involving any significant amount of data, it's always light years faster to grab all the data and then figure out a way to correlate it at a later time via stats, versus using a subsearch in your base query. To illustrate what I mean, say for example you have two sourcetypes "left" and "right", each containing their own set of data that has a shared unique identifier that can correlate the data we'll call "unique_id". So why does a search like this:

index=left sourcetype=left [search index=right sourcetype=right | stats count by unique_id| fields unique_id] | stats count

Take massively longer (and in a lot of cases just timeout indefinitely due to memory limits of a subsearch being exceeded) than something like this:

(index=left sourcetype=left) OR (index=right sourcetype=right) 
| eval left_count=if(sourcetype=left,1,0)
| stats values(sourcetype) as sourcetypes, sum(left_count) as left_count by unique_id
| search sourcetypes=*left* sourcetypes=*right*
| stats sum(left_count) as count

I'm wondering why subsearch is always so much slower for something like this?

Tags (1)
0 Karma

koshyk
Super Champion

the sub-search in your first example, is fast when the output rows in your subsearch is less or low in numbers (eg less than 100)
In our testing, the parsing of the sub-search also takes time as Splunk takes time to "get all the values" before proceeding further.

The subsearch is in square brackets and is run first. You also need to see the "expanded search" in the "Job" tab to see how the results are then passed as key-value fields to the outer search (You can see how complex sub-search is !!). This becomes too time consuming when the results of subsearch exceed 1000 rows.

also have a look into "Sub-search Limitations" (eg 10K results) and "Performance Considerations"
https://docs.splunk.com/Documentation/Splunk/8.0.2/Search/Aboutsubsearches

The 2nd search is like a normal Splunk search whereby it can query the data in parallel and highly efficient.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...