Dashboards & Visualizations

Join event to the closest in time matching event

jbdumoulin
Engager

Hello splunkers.

I have the following case, I need to retrieve an event that happens periodically. Then after getting that event (let's call it X) I need to check if an another event (Y) happened in a 20 minutes timeframe relatively to X happening.

I tried doing it with a subsearch like this

< find event X > | join type=inner hostname [ search < find event Y >] | eval earliest=_time+1200 | eval latest=_time] | where NOT (_time < earliest AND _time > latest) | table

However, if I do this, my join will only return the last < event Y > found, and so if there are multiple Y events, that could match with some X events, this won't work. For instance if I have the following events :

Event Y _time
1 11 : 10 : 05
2 11 : 35 : 30
3 11 : 47 : 22

Event X _time
4 11 : 12 : 25
5 11 : 32 : 56
6 11 : 52 : 02

I'd like to exclude event 4 (happened 2 minutes after event 1), event 6 (happened 5 minutes after event 3) but not event 5 (happened 22 minutes after event 1 and before event 2).

Yet, with my current search, only event 6 will be excluded, because event 4 and 5 are compared to the time for event 3. Only the last event in the subsearch is compared to the main search. In an ideal world I'd like to have each event be compared to each time and if one match then exclude it, or at least each event being compared to the closest event and not to the last one !

I also tried to use append and selfjoin to create a "one to many" relationship with the subsearch instead of the "one to one" from the join, but the results are pure non sense when I use this.

< find event X > | append [ search < find event Y >] | eval earliest=_time+1200 | eval latest=_time] |selfjoin hostname | table

But by doing so, without any filter on the _time, I end up with event 1, 2, 3 and 4, but not 5 and 6 with no real reason, they are joined on the same field. They all have the same hostname in this specific case.

I thought about transactions to do the trick, but I need to specify precise boundaries. If event Y happens after X, then it should be kept, while if it happens before AND during a 20 minutes tilmespan it can be excluded safely. I think transactions can't do that, at least I didn't find parameters in the documentation to give a start time and end time for the transaction.

Thanks for your help.

Tags (2)
0 Karma

BonMot
Explorer

Part of the problem is that sub-searches, joins, appends, etc. all run once per search. What you really want to do is runs a search per EVENT

Try this:

< find event X > 
| eval start=_time, end=relative_time(_time,"+20m") 
| table start end host <any other fields you want to reference in map search>
|  map 
    search="search host=$host$ earliest=$start$ latest=$end$ <search for Y>  " 
| stats count  AS "Count Event Y" by host

In this search your X search is just collecting variables for the map search. The map search then replaces the source event with all of the matches of the map search, unique to each source event. This means each source event gets its own timerange for the map search and can return more than one row. If you want run stats against each individual map search instead of at the end, you can add pipes in the the search term and then you only get one row back per X event.

As you can imagine, map searches are expensive so you want your source search to be as small as possible and your map search to be as efficient as possible

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...