I need to filter down an outer search based on the inner search's results. What I mean is that my inner search returns maybe 50~ results, however... my outer search returns millions. I want to find the bad seed from the inner search somehow and remove it during the outer search.
The following is an example of what I'm doing now. Look for dns requests that got a response and check if there was http traffic to them:
index=web_logs sourcetype=http [search index=web_logs sourcetype=dns_requests response=true | dedup domain_name | fields domain_name]
In the example above, there are only a couple of unique domain name hits from dns logs. However, when we use those domains in the http log, there are TONS due to unique uri's at the end of the domains. I want to ignore the loudest domains, but I only know if they will be loud at the point of the outer search! (For a moment imagine that dns requests are not the only one set of the data here... there could be IPs in random protocols which makes this even more complicated to figure out the culprit)
I was hoping that there was a way to make the outer search remember what inner search string had found the events in the first place like the following, but I don't believe this is a thing:
index=web_logs sourcetype=http matched!=google.com [search remember_match=yes index=web_logs sourcetype=dns_requests response=true | dedup domain_name | fields domain_name]
Is this possible in Splunk today? I'm open to other ways of doing this as well.
When you need to retain some kind of data or marker from the "inner" search, and have that thing survive into the outer search, then this is where you take a left turn out of subsearches and into the broader world of eval+stats
(<Search 1> ) OR (<Search 2>) | eval <some things to normalize and/or create markers> | stats <some things> by <field(s) by which you need to group>
( index=web_logs sourcetype=http matched!=google.com) OR (index=web_logs sourcetype=dns_requests response=true) | eval type=case(sourcetype="http","http",sourcetype="dns_requests","dns") | stats count by uri domain_name type
Now this wont filter out the "loud" ones yet. But we have the raw ingredients now. The next step would look something like this:
( index=web_logs sourcetype=http matched!=google.com) OR (index=web_logs sourcetype=dns_requests response=true) | eval type=case(sourcetype="http","http",sourcetype="dns_requests","dns") | stats count by uri domain_name type | eventstats sum(count) as totalDomainCount by domain_name | where totalDomainCount<10000
Somewhat obviously, the base search here can be refactored a bit as
index=weblogs ( sourcetype=http matched!=google.com) OR (sourcetype=dnsrequests response=true)
I'm not done fully comprehending this yet, but what do you do when the field in the second search maps to more than one field in the first search. For instance, the second search's field is not existant in the first search, but it can be mapped to many in the first. Or better yet - second search is "indicator=ips/domainnames/etc" -> first search's relevant fields could be ip/domainname/uri.
If I understand your question, the answer is some kind of conditional eval statement.
| eval normalizedId=if(sourcetype="A",someIdField,sourcetype="B",someOtherIdField)