Splunk Search

How to join two searches by closest time fields in my two indexes, not using the _time field?

Engager

Hi all,

I am going to simplify my problem. I have two indexes with the following variables:

index 1: timein userid

10:55 user1
11:20 user1
12:03 user2

index 2: timereg userid colour

10:50 user1 green
10:56 user1 yellow
10:57 user1 red
10:58 user2 green

Neither timein nor timereg are the same as time, and their values are never equal. I want to join the event of index1 with events of index2. Each event must just have a match with the criteria of being the closest in time, but timereg from index2 can never be after time_in of index1. The result of the join in the example would be:

JOIN: timein timereg user_id colour

10:55 10:50 user1 green
11:20 10:57 user1 red
12:03 10:58 user2 green

I have seen similar questions like this that they use 'transaction' or 'localize'. However, I have not found the way to use these commands without relying on the variable time. Instead I want to not use _time at all and use timein and time_reg.

Thank you very much in advance!

Tags (2)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour

View solution in original post

0 Karma

Engager

Thanks for both answers.

At the end I decided to implement a really inefficient solution (I execute this process really seldom so it doesn't matter) that would return more or less what I wanted.
What I do is a join between the two tables on user_id. Then, after the join I do:

eval difftimes=timein-timereg | search difftimes>=0 AND diff_times<600000

So at the end I filter the results where the two times are within a range of 10 minutes. I know that this is a really poor solution, but I find joins and time related operations quite difficult in splunk.

0 Karma

Esteemed Legend

You should "Accept" your answer and also any other answers that you think are correct to close out the question.

0 Karma

SplunkTrust
SplunkTrust

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour

View solution in original post

0 Karma

Esteemed Legend

You should be able to override the original _time variable and then use the solutions that you have already seen like this:

... | eval _time = coalesce(time_in, time_reg, _time) | ...
0 Karma