Solved: Re: Join and calculate difference in time using a...

yk010123 · ‎08-16-2022

I have the following queries

query 1 :  index1 .... | table _time uniqueID

query 2 :  index2 .... | table _time uniqueID

And I am trying to find events where the uniqueID is found in both AND the the subtraction between the time is greater than N milliseconds

Ideally, the output should be something like :

uniqueID	System 1 time	System 2 time	Difference in Millis
123	Time 1	Time 2	500
1234	Time 11	Time 22	60

Could you please help?

bowesmana · ‎08-16-2022

Basic approach could be

(search1) OR (search2)
| sort _time
| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by uniqueID
| where range > 50
| eval t1=strftime(min, "%F %T"), t2=strftime(max, "%F %T")
| table uniqueID t1 t2 range

but what you get will depend on how many of each uniqueID there are per search/index, whether there are duplicates, so you may need to tweak this to give you the results you want

View solution in original post

bowesmana · ‎08-16-2022

Basic approach could be

(search1) OR (search2)
| sort _time
| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by uniqueID
| where range > 50
| eval t1=strftime(min, "%F %T"), t2=strftime(max, "%F %T")
| table uniqueID t1 t2 range

but what you get will depend on how many of each uniqueID there are per search/index, whether there are duplicates, so you may need to tweak this to give you the results you want

yk010123 · ‎08-16-2022

Could you please explain the query?

bowesmana · ‎08-16-2022

Query does

(search1) OR (search2)
| sort _time

searches both data sets and sorts by time - may not be necessary to sort, as events should out in time order

| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by uniqueID

Looks over a window of 2 events with the same uniqueID and calculates the 'range' of time between the two, storing the result in a new field called 'range'. value will be range of the two _time values, so in seconds.

| where range > 50

filters out the range you are looking for, here 50 seconds.

then formatting table. Note that this will just give you the gap between every 'pair' of events, so if you have 3 events for index 1 and 2 for index 2 for the same ID, then you will get the difference between each event, hence my comment about duplicates, i.e. you need to know what you expect to get given your data.

yk010123 · ‎08-16-2022

What do you recommend to handle duplicates?

We expect to have duplicates but the duplicate will be present in both

bowesmana · ‎08-17-2022

Can you give an example of how you would get multiple events for the same uniqueID across each data set, the ordering of how they might appear and their significance.

i.e. if your data set had this sequence, what 'range' are you looking for?

10:00:01 index1 event_1 ID1 
10:00:05 index1 event_2 ID1 
10:00:09 index2 event_3 ID1 
10:00:14 index2 event_4 ID1

event_2 to event_3 (4 seconds)
event_1 to event_3 (8 seconds)
event_1 to event_4 (13 seconds)
event_2 to event_4 (9 seconds)

that very much dictates the solution

yk010123 · ‎08-17-2022

Usually, the expected path is index1->index2

But occasionally it is possible to see index1 -> index1 -> index2 -> index1

For example, for the base duplicated case, I am expecting to see something like

10:00:01 index1 event_1 ID1

10:00:05 index2 event_1 ID1

10:00:09 index1 event_1 ID1

10:00:09 index1 event_1 ID2

10:00:14 index2 event_1 ID2

10:00:14 index2 event_1 ID1

All events are expected to land both indexes in order. First in index1 and then in index2

For this scenario, I am expecting to see both results in the table in the order that they appeared

At the same time, occasionally and when the events are lost, it is possible to see this

10:00:01 index1 event_1 ID1

10:00:09 index1 event_1 ID1 (a retry of a request that took too long or was lost in transit)

10:00:14 index2 event_1 ID1

10:00:15 index1 event_1 ID1 (a delayed response)

This situation is possible but not as common

The expected output here is trickier. When this happens, we can try using the second request and the latest response even if it is not as accurate

bowesmana · ‎08-17-2022

Using that example data you've shown, here is some SPL that hopefully will point you in the right direction about how to handle these and what to do with the possible values of range that get calculated.

It's non-trivial to get SPL that will handle a non-predictable dataset, so you'll have to see what values you can get out of the data and what will make sense in each condition.

But the basis is that streamstats will do the job of calculating deltas between field values, here it's using a window of 2 for each ID and from there you can probably work out which values are important

| makeresults
| eval _raw="10:00:01 index1 event_1 ID1##10:00:05 index2 event_1 ID1##10:00:09 index1 event_1 ID1##10:00:09 index1 event_1 ID2##10:00:14 index2 event_1 ID2##10:00:14 index2 event_1 ID1##"
| eval x=split(_raw, "##")
| mvexpand x
| rex field=x "(?<T>[^ ]*) (?<ind>[^ ]*) (?<EV>[^ ]*) (?<ID>.*)"
| eval _time=strptime(T, "%T")
| table _time ind EV ID x
| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by ID
| stats values(x) as dataset min(min) as min max(max) as max list(ind) as ind list(range) as range values(EV) as EV by ID
| eval total_duration=max-min
| append [
  | makeresults
  | eval _raw="10:00:01 index1 event_1 ID3##10:00:09 index1 event_1 ID3##10:00:14 index2 event_1 ID3##10:00:15 index1 event_1 ID3"
  | eval x=split(_raw, "##")
  | mvexpand x
  | rex field=x "(?<T>[^ ]*) (?<ind>[^ ]*) (?<EV>[^ ]*) (?<ID>.*)"
  | eval _time=strptime(T, "%T")
  | table _time ind EV ID x
  | streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by ID
  | stats values(x) as dataset min(min) as min max(max) as max list(ind) as ind list(range) as range values(EV) as EV by ID
  | eval total_duration=max-min
]

Hope this helps

Join and calculate difference in time using an unique key?

eval

join

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?