Splunk Search

Why does the subsearch example in the Splunk Search Tutorial seems to repeat itself?

davidmichaelkar
New Member

I'm stepping through the main Splunk Search Tutorial. I'm at the "subsearch" section: https://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch

The cited example search is the following:

sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count, dc(productId), values(productId) by clientip

What seems curious to me is that the subsearch begins with the entire content of the "outer search", being sourcetype=access_* status=200 action=purchase. It seems odd to me that the subsearch needs to repeat the entire outer search, and then qualifying it. Is it perhaps that this is just a nonsensical subsearch use case?

0 Karma
1 Solution

somesoni2
SplunkTrust
SplunkTrust

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

View solution in original post

somesoni2
SplunkTrust
SplunkTrust

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

lstewart_splunk
Splunk Employee
Splunk Employee

Thank you Somesoni2, really clear explanation !
I will add this to the Search Tutorial and to the Search Reference so that others are not confused.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...