Splunk Search

How to join two searches by closest time fields in my two indexes, not using the _time field?

vbarna
Engager

Hi all,

I am going to simplify my problem. I have two indexes with the following variables:

index 1: time_in user_id

10:55 user1
11:20 user1
12:03 user2

index 2: time_reg user_id colour

10:50 user1 green
10:56 user1 yellow
10:57 user1 red
10:58 user2 green

Neither time_in nor time_reg are the same as _time, and their values are never equal. I want to join the event of index1 with events of index2. Each event must just have a match with the criteria of being the closest in time, but time_reg from index2 can never be after time_in of index1. The result of the join in the example would be:

JOIN: time_in time_reg user_id colour

10:55 10:50 user1 green
11:20 10:57 user1 red
12:03 10:58 user2 green

I have seen similar questions like this that they use 'transaction' or 'localize'. However, I have not found the way to use these commands without relying on the variable _time. Instead I want to not use _time at all and use time_in and time_reg.

Thank you very much in advance!

Tags (2)
0 Karma
1 Solution

somesoni2
Revered Legend

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour

View solution in original post

0 Karma

vbarna
Engager

Thanks for both answers.

At the end I decided to implement a really inefficient solution (I execute this process really seldom so it doesn't matter) that would return more or less what I wanted.
What I do is a join between the two tables on user_id. Then, after the join I do:

eval diff_times=time_in-time_reg | search diff_times>=0 AND diff_times<600000

So at the end I filter the results where the two times are within a range of 10 minutes. I know that this is a really poor solution, but I find joins and time related operations quite difficult in splunk.

0 Karma

woodcock
Esteemed Legend

You should "Accept" your answer and also any other answers that you think are correct to close out the question.

0 Karma

somesoni2
Revered Legend

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour
0 Karma

woodcock
Esteemed Legend

You should be able to override the original _time variable and then use the solutions that you have already seen like this:

... | eval _time = coalesce(time_in, time_reg, _time) | ...
0 Karma
Get Updates on the Splunk Community!

Your Guide to Splunk Digital Experience Monitoring

A flawless digital experience isn't just an advantage, it's key to customer loyalty and business success. But ...

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...