Solved: Why does the subsearch example in the Splunk Searc...

davidmichaelkar · ‎09-30-2016

I'm stepping through the main Splunk Search Tutorial. I'm at the "subsearch" section: https://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch

The cited example search is the following:

sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count, dc(productId), values(productId) by clientip

What seems curious to me is that the subsearch begins with the entire content of the "outer search", being sourcetype=access_* status=200 action=purchase. It seems odd to me that the subsearch needs to repeat the entire outer search, and then qualifying it. Is it perhaps that this is just a nonsensical subsearch use case?

somesoni2 · ‎09-30-2016

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

View solution in original post

somesoni2 · ‎09-30-2016

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

lstewart_splunk · ‎10-03-2016

Thank you Somesoni2, really clear explanation !
I will add this to the Search Tutorial and to the Search Reference so that others are not confused.

Why does the subsearch example in the Splunk Search Tutorial seems to repeat itself?

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024