Subsearch - Filter out outer search using inner se...

thisissplunk · ‎04-12-2016

I need to filter down an outer search based on the inner search's results. What I mean is that my inner search returns maybe 50~ results, however... my outer search returns millions. I want to find the bad seed from the inner search somehow and remove it during the outer search.

The following is an example of what I'm doing now. Look for dns requests that got a response and check if there was http traffic to them:

index=web_logs sourcetype=http [search index=web_logs sourcetype=dns_requests response=true | dedup domain_name | fields domain_name]

In the example above, there are only a couple of unique domain name hits from dns logs. However, when we use those domains in the http log, there are TONS due to unique uri's at the end of the domains. I want to ignore the loudest domains, but I only know if they will be loud at the point of the outer search! (For a moment imagine that dns requests are not the only one set of the data here... there could be IPs in random protocols which makes this even more complicated to figure out the culprit)

I was hoping that there was a way to make the outer search remember what inner search string had found the events in the first place like the following, but I don't believe this is a thing:

index=web_logs sourcetype=http matched!=google.com [search remember_match=yes index=web_logs sourcetype=dns_requests response=true | dedup domain_name | fields domain_name]

Is this possible in Splunk today? I'm open to other ways of doing this as well.

sideview · ‎04-12-2016

When you need to retain some kind of data or marker from the "inner" search, and have that thing survive into the outer search, then this is where you take a left turn out of subsearches and into the broader world of eval+stats

Broadly speaking,

(<Search 1> ) OR (<Search 2>) 
| eval <some things to normalize and/or create markers>
| stats <some things> by <field(s) by which you need to group>

specifically

( index=web_logs sourcetype=http matched!=google.com) OR (index=web_logs sourcetype=dns_requests response=true) 
| eval type=case(sourcetype="http","http",sourcetype="dns_requests","dns")
| stats count by uri domain_name type

Now this wont filter out the "loud" ones yet. But we have the raw ingredients now. The next step would look something like this:

( index=web_logs sourcetype=http matched!=google.com) OR (index=web_logs sourcetype=dns_requests response=true) 
| eval type=case(sourcetype="http","http",sourcetype="dns_requests","dns")
| stats count by uri domain_name type
| eventstats sum(count) as totalDomainCount by domain_name
| where totalDomainCount<10000

Notes:

Somewhat obviously, the base search here can be refactored a bit as

index=web_logs ( sourcetype=http matched!=google.com) OR (sourcetype=dns_requests response=true)
- The eval statement that creates the "type" field is a little trivial, and in this particular case we could just use the sourcetype value itself. However I've left the eval clause here as a stand-in for what is often a more complicated eval expression.

thisissplunk · ‎04-12-2016

I'm not done fully comprehending this yet, but what do you do when the field in the second search maps to more than one field in the first search. For instance, the second search's field is not existant in the first search, but it can be mapped to many in the first. Or better yet - second search is "indicator=ips/domain_names/etc" -> first search's relevant fields could be ip/domain_name/uri.

sideview · ‎04-12-2016

If I understand your question, the answer is some kind of conditional eval statement.

| eval normalizedId=if(sourcetype="A",someIdField,sourcetype="B",someOtherIdField)

Subsearch - Filter out outer search using inner search's results? Finding the bad seed.

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Subsearch - Filter out outer search using inner search's results? Finding the bad seed.

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...