Splunk Search

Join and calculate difference in time using an unique key?

yk010123
Path Finder

I have the following queries 

 

 

query 1 :  index1 .... | table _time uniqueID

query 2 :  index2 .... | table _time uniqueID

 

And I am trying to find events where the uniqueID is found in both AND the the subtraction between the time is greater than N milliseconds 

 

Ideally, the output should be something like : 

 

uniqueID System 1 time System 2 time Difference in Millis
123 Time 1 Time 2 500
1234 Time 11 Time 22 60

 

Could you please help?

Labels (2)
Tags (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Basic approach could be

(search1) OR (search2)
| sort _time
| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by uniqueID
| where range > 50
| eval t1=strftime(min, "%F %T"), t2=strftime(max, "%F %T")
| table uniqueID t1 t2 range

but what you get will depend on how many of each uniqueID there are per search/index, whether there are duplicates, so you may need to tweak this to give you the results you want

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

Basic approach could be

(search1) OR (search2)
| sort _time
| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by uniqueID
| where range > 50
| eval t1=strftime(min, "%F %T"), t2=strftime(max, "%F %T")
| table uniqueID t1 t2 range

but what you get will depend on how many of each uniqueID there are per search/index, whether there are duplicates, so you may need to tweak this to give you the results you want

 

yk010123
Path Finder

Could you please explain the query?

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Query does

(search1) OR (search2)
| sort _time

searches both data sets and sorts by time - may not be necessary to sort, as events should out in time order

| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by uniqueID

Looks over a window of 2 events with the same uniqueID and calculates the 'range' of time between the two, storing the result in a new field called 'range'. value will be range of the two _time values, so in seconds.

| where range > 50

filters out the range you are looking for, here 50 seconds.

then formatting table. Note that this will just give you the gap between every 'pair' of events, so if you have 3 events for index 1 and 2 for index 2 for the same ID, then you will get the difference between each event, hence my comment about duplicates, i.e. you need to know what you expect to get given your data.

 

yk010123
Path Finder

What do you recommend to handle duplicates?

We expect to have duplicates but the duplicate will be present in both

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Can you give an example of how you would get multiple events for the same uniqueID across each data set, the ordering of how they might appear and their significance.

i.e. if your data set had this sequence, what 'range' are you looking for?

 

10:00:01 index1 event_1 ID1 
10:00:05 index1 event_2 ID1 
10:00:09 index2 event_3 ID1 
10:00:14 index2 event_4 ID1 

 

  • event_2 to event_3 (4 seconds)
  • event_1 to event_3 (8 seconds)
  • event_1 to event_4 (13 seconds)
  • event_2 to event_4 (9 seconds)

that very much dictates the solution 

yk010123
Path Finder

Usually, the expected path is index1->index2

But occasionally it is possible to see index1 -> index1 -> index2 -> index1

For example, for the base duplicated case, I am expecting to see something like

10:00:01 index1 event_1 ID1 

10:00:05 index2 event_1 ID1 

10:00:09 index1 event_1 ID1 

10:00:09 index1 event_1 ID2

10:00:14 index2 event_1 ID2

10:00:14 index2 event_1 ID1 

All events are expected to land both indexes in order. First in index1 and then in index2

For this scenario, I am expecting to see both results in the table in the order that they appeared

At the same time, occasionally and when the events are lost, it is possible to see this

10:00:01 index1 event_1 ID1

10:00:09 index1 event_1 ID1 (a retry of a request that took too long or was lost in transit)

10:00:14 index2 event_1 ID1 

10:00:15 index1 event_1 ID1 (a delayed response)

This situation is possible but not as common

The expected output here is trickier. When this happens, we can try using the second request and the latest response even if it is not as accurate

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Using that example data you've shown, here is some SPL that hopefully will point you in the right direction about how to handle these and what to do with the possible values of range that get calculated.

It's non-trivial to get SPL that will handle a non-predictable dataset, so you'll have to see what values you can get out of the data and what will make sense in each condition.

But the basis is that streamstats will do the job of calculating deltas between field values, here it's using a window of 2 for each ID and from there you can probably work out which values are important

| makeresults
| eval _raw="10:00:01 index1 event_1 ID1##10:00:05 index2 event_1 ID1##10:00:09 index1 event_1 ID1##10:00:09 index1 event_1 ID2##10:00:14 index2 event_1 ID2##10:00:14 index2 event_1 ID1##"
| eval x=split(_raw, "##")
| mvexpand x
| rex field=x "(?<T>[^ ]*) (?<ind>[^ ]*) (?<EV>[^ ]*) (?<ID>.*)"
| eval _time=strptime(T, "%T")
| table _time ind EV ID x
| streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by ID
| stats values(x) as dataset min(min) as min max(max) as max list(ind) as ind list(range) as range values(EV) as EV by ID
| eval total_duration=max-min
| append [
  | makeresults
  | eval _raw="10:00:01 index1 event_1 ID3##10:00:09 index1 event_1 ID3##10:00:14 index2 event_1 ID3##10:00:15 index1 event_1 ID3"
  | eval x=split(_raw, "##")
  | mvexpand x
  | rex field=x "(?<T>[^ ]*) (?<ind>[^ ]*) (?<EV>[^ ]*) (?<ID>.*)"
  | eval _time=strptime(T, "%T")
  | table _time ind EV ID x
  | streamstats global=f window=2 range(_time) as range min(_time) as min max(_time) as max by ID
  | stats values(x) as dataset min(min) as min max(max) as max list(ind) as ind list(range) as range values(EV) as EV by ID
  | eval total_duration=max-min
]

 Hope this helps

 

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...