Why do joined search returns a different results e...

Woodpecker · ‎05-03-2022

Hello,

I am trying to join two searches for see, same hash exists on the other index as well. Below is my search, the issue is every time I run a search for the same timelimit, I see different results. WHY?
Basesearch: I've tried to combine results of three different hash fields into one

(index=a sourcetype="a" (hash1=* OR hash2=* OR hash3=*)) 
| fields hash1, hash2, hash3 
| table hash1, hash2, hash3 
| eval hash=mvzip(mvzip('hash1','hash2',"|"),'hash3',"|") 
| fields hash  
| makemv hash delim="|" 
| mvexpand hash

From here, I've joined two indexes and both indexes have same field for hash files, so I'm attempting to join hash as the focus. Search seems to work fine

join type=left hash 
    [| search (index=b sourcetype=b hashfile=*) OR (index=c sourcetype=c hashfile=*) 
    | fields hashfile, filename,index
    | eval hash=hashfile]

Both the search on running individually returns 2k+ results, whereas on combining it, I could see only 1 result in the stats table and on hitting run for the same time limit every time I see different file name WHYYY?

Any help would be appreciated, thanks!

gcusello · ‎05-03-2022

Hi @Woodpecker,

two questions:

did you used a closed time range (e.g. last hour or yesterday) or did you used a time period containing "latest=now"?

Because in the second case, you cannot compare results, you have to choose a closed period (e.g. last hour or yesterday).

if you run the joined search, in the same time period of the main, have you more or less than 50,000 results?

remember that subsearches have the limit of 50,000 results and this influeces the final result.

this is one of the reasons (the second are the prestactions) to use join only as last option when you haven't any other choice!

Then I don't understand why you created the first search in this way, if you could have three fields as hash and you want only one, you could use coalesce, something like this:

(index=a sourcetype="a" (hash1=* OR hash2=* OR hash3=*))  OR index=b sourcetype=b hashfile=*) OR (index=c sourcetype=c hashfile=*)
| eval hash=if(index=a,coalesce("hash1","hash2","hash3",hash) 
| stats values(filename) As filename values(index) AS index dc(index) AS dc_index BY hash
| where dc_index=2
| table hash filename

I don't know if this search is running because I cannot test it, but I'd like to show you the approach to replace join with stats.

Ciao.

Giuseppe

isoutamo · ‎05-04-2022

Hi

just check those limits and it's even lower. Only 10k events and/or 60s and TTL 300s.

https://docs.splunk.com/Documentation/Splunk/8.2.6/Search/Aboutsubsearches#Subsearch_performance_con...

If needed and you cannot user @gcusello 's suggestion to replace sub search e.g. with stats you can increase those values in limits.conf.

r. Ismo

bowesmana · ‎05-03-2022

If you are getting varying results for the same time interval and are using join or some of the other commands with limits, then it could be you are hitting limits.

An alternative search could be this - remove join - and use stats instead

(index=a sourcetype="a" (hash1=* OR hash2=* OR hash3=*)) OR (index=b sourcetype=b hashfile=*) OR (index=c sourcetype=c hashfile=*) 
| eval hash=if(index=a,mvzip(mvzip('hash1','hash2',"|"),'hash3',"|"), hashfile)
| fields hash filename index
| mvexpand hash
| stats values(*) as * by hash

That makes hash either the combined hash from your first search, or hashfile from your second. It then expands hash as you do, but includes the fields from the second search.

It then uses stats to aggregate the fields again based on hash value.

Why do joined search returns a different results everytime I hit a search?

join

subsearch

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?