Solved: How to search for events that are in one index, bu...

jsilverstein · ‎08-17-2016

I have 3 indexes containing events with IP addresses, index1, index2, and index3. My goal is to return a list of all IP addresses that are present in index1, but are not present in index2 or index3.

My current solution finds the IPs that are only in either index1 or (index2 or index3), using set diff, then intersects that result with index1 to limit the IPs to ones in index1:

| set intersect [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ | set diff [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ search (index=index2 OR index=index3) AND IpAddr earliest=-5d | dedup 1 IpAddr | rename IpAddr AS ip | table ip ] ]

This solution works, except each subsearch returns more than the maximum subsearch limit of 10500 (around 20000 results).

Does anyone have any idea how this search could be rewritten without using subsearches, to avoid this limitation?

Thanks for any help or ideas.

jsilverstein · ‎08-17-2016

The final solution is:

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP

Thanks @mus for your help making this more efficient.

The first solution took up large amounts of memory (>500MB):

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips

Thanks @ppablo for the helpful link and the quick response.

View solution in original post

jsilverstein · ‎08-17-2016

The final solution is:

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP

Thanks @mus for your help making this more efficient.

The first solution took up large amounts of memory (>500MB):

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips

Thanks @ppablo for the helpful link and the quick response.

jsilverstein · ‎08-17-2016

This search uses a significant amount of memory (>500MB). It appears that the output from values(IpAddr) AS ips2 is duplicated in memory for each values(ip) AS ips when mvexpand ips is executed. Is there a way to force Splunk to store only 1 copy of values(IpAddr) in memory?

MuS · ‎08-17-2016

Yes, mvexpand is the down side in this example ... Maybe a different approach could help here; How about counting the number of times an IP appears in the indexes and compare it that way?

((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d 
| dedup IpAddr keepempty=true
| eval myIP = coalesce(ip,IpAddr) 
| stats dc(index) AS c_idx by myIP 
| where c_idx = 1

This should show only IP's that are in one index.

cheers, MuS

jsilverstein · ‎08-17-2016

I really like that idea; it seems really clean and simple. But I need the result to be IPs that are only in index1, with no IPs that are only in index2 or index3. Any thoughts how I could achieve this? Maybe a function similar to coalesce that could additionally assign a label, or maybe through an eval of some sort?

MuS · ‎08-17-2016

eval is your friend 😉

 | eval c_idx1=if(index="index1", 1, null())

and change the stats and the where to be

 | stats dc(idx) AS c_idx sum(c_idx1) AS c_idx1 by myIP
 | where c_idx=1 AND c_idx1=1

cheers, MuS

jsilverstein · ‎08-17-2016

Thanks a lot for your help. I learned a bunch. I'll update the answer when I get the final solution finished.

ppablo · ‎08-17-2016

Awesome, I'm glad the link helped you come up with an efficient search 🙂 Thanks for sharing your final solution with the community, and welcome to Splunk Answers!

Cheers
Patrick

MuS · ‎08-17-2016

Always good to see that my Q&A is helpful 🙂

ppablo · ‎08-17-2016

Hi @jsilverstein

If you haven't seen this yet, this previous Q&A by @mus might give you some ideas on how to approach this:
https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

How to search for events that are in one index, but not in another without using a subsearch?

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

How to search for events that are in one index, but not in another without using a subsearch?

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...