Splunk Search
Highlighted

How to search for events that are in one index, but not in another without using a subsearch?

Explorer

I have 3 indexes containing events with IP addresses, index1, index2, and index3. My goal is to return a list of all IP addresses that are present in index1, but are not present in index2 or index3.

My current solution finds the IPs that are only in either index1 or (index2 or index3), using set diff, then intersects that result with index1 to limit the IPs to ones in index1:

| set intersect [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ | set diff [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ search (index=index2 OR index=index3) AND IpAddr earliest=-5d | dedup 1 IpAddr | rename IpAddr AS ip | table ip ] ]

This solution works, except each subsearch returns more than the maximum subsearch limit of 10500 (around 20000 results).

Does anyone have any idea how this search could be rewritten without using subsearches, to avoid this limitation?

Thanks for any help or ideas.

0 Karma
Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

Community Manager
Community Manager

Hi @jsilverstein

If you haven't seen this yet, this previous Q&A by @mus might give you some ideas on how to approach this:
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

Explorer

The final solution is:

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP

Thanks @mus for your help making this more efficient.

The first solution took up large amounts of memory (>500MB):

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips

Thanks @ppablo for the helpful link and the quick response.

View solution in original post

Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

Community Manager
Community Manager

Awesome, I'm glad the link helped you come up with an efficient search 🙂 Thanks for sharing your final solution with the community, and welcome to Splunk Answers!

Cheers
Patrick

0 Karma
Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

SplunkTrust
SplunkTrust

Always good to see that my Q&A is helpful 🙂

0 Karma
Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

Explorer

This search uses a significant amount of memory (>500MB). It appears that the output from values(IpAddr) AS ips2 is duplicated in memory for each values(ip) AS ips when mvexpand ips is executed. Is there a way to force Splunk to store only 1 copy of values(IpAddr) in memory?

0 Karma
Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

SplunkTrust
SplunkTrust

Yes, mvexpand is the down side in this example ... Maybe a different approach could help here; How about counting the number of times an IP appears in the indexes and compare it that way?

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d 
| dedup IpAddr keepempty=true
| eval myIP = coalesce(ip,IpAddr) 
| stats dc(index) AS c_idx by myIP 
| where c_idx = 1

This should show only IP's that are in one index.

cheers, MuS

Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

Explorer

I really like that idea; it seems really clean and simple. But I need the result to be IPs that are only in index1, with no IPs that are only in index2 or index3. Any thoughts how I could achieve this? Maybe a function similar to coalesce that could additionally assign a label, or maybe through an eval of some sort?

0 Karma
Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

SplunkTrust
SplunkTrust

eval is your friend 😉

 | eval c_idx1=if(index="index1", 1, null())

and change the stats and the where to be

 | stats dc(idx) AS c_idx sum(c_idx1) AS c_idx1 by myIP
 | where c_idx=1 AND c_idx1=1

cheers, MuS

Highlighted

Re: How to search for events that are in one index, but not in another without using a subsearch?

Explorer

Thanks a lot for your help. I learned a bunch. I'll update the answer when I get the final solution finished.

0 Karma