Splunk Search

How to search for events that are in one index, but not in another without using a subsearch?

jsilverstein
Explorer

I have 3 indexes containing events with IP addresses, index1, index2, and index3. My goal is to return a list of all IP addresses that are present in index1, but are not present in index2 or index3.

My current solution finds the IPs that are only in either index1 or (index2 or index3), using set diff, then intersects that result with index1 to limit the IPs to ones in index1:

| set intersect [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ | set diff [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ search (index=index2 OR index=index3) AND IpAddr earliest=-5d | dedup 1 IpAddr | rename IpAddr AS ip | table ip ] ]

This solution works, except each subsearch returns more than the maximum subsearch limit of 10500 (around 20000 results).

Does anyone have any idea how this search could be rewritten without using subsearches, to avoid this limitation?

Thanks for any help or ideas.

0 Karma
1 Solution

jsilverstein
Explorer

The final solution is:

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP

Thanks @mus for your help making this more efficient.

The first solution took up large amounts of memory (>500MB):

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips

Thanks @ppablo for the helpful link and the quick response.

View solution in original post

jsilverstein
Explorer

The final solution is:

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP

Thanks @mus for your help making this more efficient.

The first solution took up large amounts of memory (>500MB):

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips

Thanks @ppablo for the helpful link and the quick response.

jsilverstein
Explorer

This search uses a significant amount of memory (>500MB). It appears that the output from values(IpAddr) AS ips2 is duplicated in memory for each values(ip) AS ips when mvexpand ips is executed. Is there a way to force Splunk to store only 1 copy of values(IpAddr) in memory?

0 Karma

MuS
Legend

Yes, mvexpand is the down side in this example ... Maybe a different approach could help here; How about counting the number of times an IP appears in the indexes and compare it that way?

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d 
| dedup IpAddr keepempty=true
| eval myIP = coalesce(ip,IpAddr) 
| stats dc(index) AS c_idx by myIP 
| where c_idx = 1

This should show only IP's that are in one index.

cheers, MuS

jsilverstein
Explorer

I really like that idea; it seems really clean and simple. But I need the result to be IPs that are only in index1, with no IPs that are only in index2 or index3. Any thoughts how I could achieve this? Maybe a function similar to coalesce that could additionally assign a label, or maybe through an eval of some sort?

0 Karma

MuS
Legend

eval is your friend 😉

 | eval c_idx1=if(index="index1", 1, null())

and change the stats and the where to be

 | stats dc(idx) AS c_idx sum(c_idx1) AS c_idx1 by myIP
 | where c_idx=1 AND c_idx1=1

cheers, MuS

jsilverstein
Explorer

Thanks a lot for your help. I learned a bunch. I'll update the answer when I get the final solution finished.

0 Karma

ppablo
Retired

Awesome, I'm glad the link helped you come up with an efficient search 🙂 Thanks for sharing your final solution with the community, and welcome to Splunk Answers!

Cheers
Patrick

0 Karma

MuS
Legend

Always good to see that my Q&A is helpful 🙂

0 Karma

ppablo
Retired

Hi @jsilverstein

If you haven't seen this yet, this previous Q&A by @mus might give you some ideas on how to approach this:
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...