I have 3 indexes containing events with IP addresses, index1, index2, and index3. My goal is to return a list of all IP addresses that are present in index1, but are not present in index2 or index3.
My current solution finds the IPs that are only in either index1 or (index2 or index3), using set diff, then intersects that result with index1 to limit the IPs to ones in index1:
| set intersect [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ | set diff [ search index=index1 AND ip earliest=-3d | dedup 1 ip | table ip ] [ search (index=index2 OR index=index3) AND IpAddr earliest=-5d | dedup 1 IpAddr | rename IpAddr AS ip | table ip ] ]
This solution works, except each subsearch returns more than the maximum subsearch limit of 10500 (around 20000 results).
Does anyone have any idea how this search could be rewritten without using subsearches, to avoid this limitation?
Thanks for any help or ideas.
If you haven't seen this yet, this previous Q&A by @mus might give you some ideas on how to approach this:
The final solution is:
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup ip keepempty=true | dedup IpAddr keepempty=true | eval myIP=coalesce(ip,IpAddr) | eval from_index1=if(index="index1",1,null()) | stats dc(index) AS num_occurences sum(from_index1) AS from_index1 by myIP | where num_occurences=1 AND from_index1=1 | table myIP
Thanks @mus for your help making this more efficient.
The first solution took up large amounts of memory (>500MB):
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | stats values(ip) AS ips values(IpAddr) AS ips2 | mvexpand ips | eval result=if(match(ips2, ips),"YES","NO") | where result="NO" | table ips
Thanks @ppablo for the helpful link and the quick response.
Awesome, I'm glad the link helped you come up with an efficient search 🙂 Thanks for sharing your final solution with the community, and welcome to Splunk Answers!
This search uses a significant amount of memory (>500MB). It appears that the output from
values(IpAddr) AS ips2 is duplicated in memory for each
values(ip) AS ips when
mvexpand ips is executed. Is there a way to force Splunk to store only 1 copy of
values(IpAddr) in memory?
mvexpand is the down side in this example ... Maybe a different approach could help here; How about counting the number of times an IP appears in the indexes and compare it that way?
((index=index1 AND ip) OR ((index=index2 OR index=index3) AND IpAddr)) earliest=-3d | dedup IpAddr keepempty=true | eval myIP = coalesce(ip,IpAddr) | stats dc(index) AS c_idx by myIP | where c_idx = 1
This should show only IP's that are in one index.
I really like that idea; it seems really clean and simple. But I need the result to be IPs that are only in index1, with no IPs that are only in index2 or index3. Any thoughts how I could achieve this? Maybe a function similar to coalesce that could additionally assign a label, or maybe through an eval of some sort?
eval is your friend 😉
| eval c_idx1=if(index="index1", 1, null())
and change the
stats and the
where to be
| stats dc(idx) AS c_idx sum(c_idx1) AS c_idx1 by myIP | where c_idx=1 AND c_idx1=1
Thanks a lot for your help. I learned a bunch. I'll update the answer when I get the final solution finished.