Splunk Search

Why does the subsearch example in the Splunk Search Tutorial seems to repeat itself?

davidmichaelkar
New Member

I'm stepping through the main Splunk Search Tutorial. I'm at the "subsearch" section: https://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch

The cited example search is the following:

sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count, dc(productId), values(productId) by clientip

What seems curious to me is that the subsearch begins with the entire content of the "outer search", being sourcetype=access_* status=200 action=purchase. It seems odd to me that the subsearch needs to repeat the entire outer search, and then qualifying it. Is it perhaps that this is just a nonsensical subsearch use case?

0 Karma
1 Solution

somesoni2
Revered Legend

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

View solution in original post

somesoni2
Revered Legend

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

lstewart_splunk
Splunk Employee
Splunk Employee

Thank you Somesoni2, really clear explanation !
I will add this to the Search Tutorial and to the Search Reference so that others are not confused.

0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

At Cisco, purpose isn’t a tagline—it’s a commitment. Cisco’s FY25 Purpose Report outlines how the company is ...

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join us for a live Demo Day at the Cisco Store on January 21st 10:00am - 11:00am PST In the fast-paced world ...