Splunk Search

Why do joined search returns a different results everytime I hit a search?

Woodpecker
Path Finder

Hello,

I am trying to join two searches for see, same hash exists on the other index as well. Below is my search, the issue is every time I run a search for the same timelimit, I see different results. WHY?
Basesearch: I've tried to combine results of three different hash fields into one

 

(index=a sourcetype="a" (hash1=* OR hash2=* OR hash3=*)) 
| fields hash1, hash2, hash3 
| table hash1, hash2, hash3 
| eval hash=mvzip(mvzip('hash1','hash2',"|"),'hash3',"|") 
| fields hash  
| makemv hash delim="|" 
| mvexpand hash 

 

From here, I've joined two indexes and both indexes have same field for hash files, so I'm attempting to join hash as the focus. Search seems to work fine

 

join type=left hash 
    [| search (index=b sourcetype=b hashfile=*) OR (index=c sourcetype=c hashfile=*) 
    | fields hashfile, filename,index
    | eval hash=hashfile]

 

Both the search on running individually returns 2k+ results, whereas on combining it, I could see only 1 result in the stats table and on hitting run for the same time limit every time I see different file name WHYYY?

Any help would be appreciated, thanks!

Labels (2)
Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Woodpecker,

two questions:

did you used a closed time range (e.g. last hour or yesterday) or did you used a time period containing "latest=now"?

Because in the second case, you cannot compare results, you have to choose a closed period (e.g. last hour or yesterday).

if you run the joined search, in the same time period of the main, have you more or less than 50,000 results?

remember that subsearches have the limit of 50,000 results and this influeces the final result.

this is one of the reasons (the second are the prestactions) to use join only as last option when you haven't any other choice!

Then I don't understand why you created the first search in this way, if you could have three fields as hash and you want only one, you could use coalesce, something like this:

(index=a sourcetype="a" (hash1=* OR hash2=* OR hash3=*))  OR index=b sourcetype=b hashfile=*) OR (index=c sourcetype=c hashfile=*)
| eval hash=if(index=a,coalesce("hash1","hash2","hash3",hash) 
| stats values(filename) As filename values(index) AS index dc(index) AS dc_index BY hash
| where dc_index=2
| table hash filename

I don't know if this search is running because I cannot test it, but I'd like to show you the approach to replace join with stats.

Ciao.

Giuseppe

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

just check those limits and it's even lower. Only 10k events and/or 60s and TTL 300s.

https://docs.splunk.com/Documentation/Splunk/8.2.6/Search/Aboutsubsearches#Subsearch_performance_con...

If needed and you cannot user @gcusello 's suggestion to replace sub search e.g. with stats you can increase those values in limits.conf.

r. Ismo

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you are getting varying results for the same time interval and are using join or some of the other commands with limits, then it could be you are hitting limits.

An alternative search could be this - remove join - and use stats instead

(index=a sourcetype="a" (hash1=* OR hash2=* OR hash3=*)) OR (index=b sourcetype=b hashfile=*) OR (index=c sourcetype=c hashfile=*) 
| eval hash=if(index=a,mvzip(mvzip('hash1','hash2',"|"),'hash3',"|"), hashfile)
| fields hash filename index
| mvexpand hash
| stats values(*) as * by hash

That makes hash either the combined hash from your first search, or hashfile from your second. It then expands hash as you do, but includes the fields from the second search.

It then uses stats to aggregate the fields again based on hash value.

 

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...