Splunk Search
Highlighted

Why does this join not correctly match the data?

Path Finder

Hi,

I'm experiencing some strangeness with the following query:

index=main_index | dedup _raw | sort _raw | rename index as index_from | join type=left _raw [search index=dedupped_index | dedup _raw | sort _raw | rename index as index_to ]  | where isNull(index_to) | collect dedupped_index

Essentially, I want to take the contents of main_index, find out which events are not in dedupped_index, and move them into dedupped_index. However, if I run this query multiple times, I get the same results constantly dumped into the dedupped_index, even though they are already there.

Does the order of the events matter for the join command? Is there a different/better way of doing this that will work?

Thanks!

Tags (3)
0 Karma
Highlighted

Re: Why does this join not correctly match the data?

SplunkTrust
SplunkTrust

Ingenious, I like this. It seems like it shouldn't be necessary in my environment very often, but having the option could be handy.

Anyway.

If you leave off the " ... | collect dedupped_index" at the end, what does the search/join/subsearch show you? If that's not the right list of events, then I'd pare down the search bit by bit confirming each part really does what you expect.

0 Karma
Highlighted

Re: Why does this join not correctly match the data?

Path Finder

It's strange. If I add fields to my query, those fields will always be (correctly) absent on the second run. It's only when I query the entire index that I see data when I shouldn't.

In any case, my current work around is as follows (it seems faster, but also feels hackier):

(index=mainindex OR index=deduppedindex) | dedup raw, index | stats count as count values(index) as index by _raw | table count, _raw, index | where count=1 AND index="mainindex" | collect index=dedupped_index

0 Karma
Highlighted

Re: Why does this join not correctly match the data?

SplunkTrust
SplunkTrust

Joins are expensive and the your workaround is indeed proper way to do this 🙂

Highlighted

Re: Why does this join not correctly match the data?

SplunkTrust
SplunkTrust

I agree with somesoni2 that you original way was not very efficient.

If the search you posted as the comment works correctly, I would recommend write that up as the answer to your own question and mark it as answered. That way people stumbling across this question in the future will know an "answer" was found.

0 Karma
Highlighted

Re: Why does this join not correctly match the data?

SplunkTrust
SplunkTrust
0 Karma
Highlighted

Re: Why does this join not correctly match the data?

Path Finder

(index=mainindex OR index=deduppedindex) | dedup raw, index | stats count as count values(index) as index by _raw | table count, _raw, index | where count=1 AND index="mainindex" | collect index=dedupped_index

View solution in original post

0 Karma