Splunk Search

Why does the subsearch example in the Splunk Search Tutorial seems to repeat itself?

davidmichaelkar
New Member

I'm stepping through the main Splunk Search Tutorial. I'm at the "subsearch" section: https://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch

The cited example search is the following:

sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count, dc(productId), values(productId) by clientip

What seems curious to me is that the subsearch begins with the entire content of the "outer search", being sourcetype=access_* status=200 action=purchase. It seems odd to me that the subsearch needs to repeat the entire outer search, and then qualifying it. Is it perhaps that this is just a nonsensical subsearch use case?

0 Karma
1 Solution

somesoni2
Revered Legend

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

View solution in original post

somesoni2
Revered Legend

The answer lies in the requirement. Below is the requirement of search, for that example

You want to find the single most frequent shopper on the Buttercup Games online store and what that shopper has purchased. Use the top command to return the most frequent shopper.

Now, remember data for both frequent shopper and purchases is coming from same data.

So, Step 1 was to find single most frequent shopper, If you check the subsearch, that's what it gets (gets the clientip of the single most frequent buyer).

sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip

Now, for this clientip, we need to get all the purchases, which we'll find in the same data using which we calculated most frequent buyer. So the outer search uses same data (successful purchases) and filter it for just that single clietip as returned by subsearch.

This is how it'll look if you don't use this simplistic method

sourcetype=access_* status=200 action=purchase  | stats count, dc(productId), values(productId) by clientip | sort 0 -count | head 1

Here, you're getting list of purchases of all buyers/clientip and then in the end, getting the most frequent ( by sorting and taking top 1 record). The former method, applies the filter in the base search itself, even though it has to run a subsearch, drastically (based on data of course) reducing the number of records that search has to process.

lstewart_splunk
Splunk Employee
Splunk Employee

Thank you Somesoni2, really clear explanation !
I will add this to the Search Tutorial and to the Search Reference so that others are not confused.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...