I have two searches with the result as displayed below. Here I want to find the service related to each activity based on the event time i.e.
For a given activity, pick the Service event which occurred closest to the Activity's event time and join the event's Service name.
The common field between these searches is the SessionId. Also the "EventTime" is an actual field in the search result and not the standard SPLUNK time (_time).
Search 1 result
Id SessionId Activity Search1Time 1 123 a 2016-02-06 14:26:00Z 2 123 a 2016-02-06 14:28:00Z 3 123 b 2016-02-06 15:06:00Z 4 123 a 2016-02-06 15:36:00Z 5 234 c 2016-02-06 16:36:00Z
Search 2 result
Id SessionId Service Search2Time 1 123 alpha 2016-02-06 12:36:00Z 2 123 gamma 2016-02-06 13:00:00Z 3 123 gamma 2016-02-06 14:00:00Z 4 123 theta 2016-02-06 15:00:00Z
Expected Intermediate Result
Id SessionId Activity Service Search1Time 1 123 a gamma 2016-02-06 14:26:00Z 2 123 a gamma 2016-02-06 14:28:00Z 3 123 b theta 2016-02-06 15:06:00Z 4 123 a theta 2016-02-06 15:36:00Z 5 234 c Missing 2016-02-06 16:36:00Z
Expected Final Result
gamma - 2 theta - 2 Missing - 1
The solution that I am trying to use is,
Perform a left join then calculate the delta between EventTimes of Search1 and Search2 and sort it in the ascending order.
Use the "Id" column to get the unique results, which should be the one with the lowest delta.
Search2|eval Search2Time=strptime(EventTime,"%Y-%m-%d %H:%M:%S.%NZ")| stats count by SessionId Service Search2Time| join type=left max=0 overwrite=false SessionId [search Search1| eval Search1Time=strptime(EventTime,"%Y-%m-%d %H:%M:%S.%NZ")| fields Id SessionId Search1Time Activity] | eval Delta=abs(Search1Time-Search2Time)|sort by Delta| fields Id SessionId Activity Service Search1Time | where Id!=" "| stats dc(Id) by Service
The Actual Result for smaller dataset using the above approach is
gamma - 2 theta - 2
1. The search performance is really very slow since there are millions of events and I am using a Join
2. The subsearch limit of 50000 causes inconsistencies in the result
3. Not sure if all the edge cases are covered
4. My query doesn't cover the "Missing" case because the subsearch limit forced me to swap the searches in the join. Also, I cannot use the .conf file due to privelege restrictions.
5. Is there a way I could get the desired result and also retain all the columns from Search1 i.e. Activity details.
Can you suggest an alternative to achieve the same for large datasets addressing the "Missing" case ? Appreciate your help !
I would suggest not using a join, maybe a transaction can assist you better. You basicaly can search matching the events of source 1 and source2 with something like
(search1) OR (search2)
and then simply transaction by SessionId. The final search would be smth like this:
(search1) OR (Search2) | eval Search1Time=strptime(EventTime,"%Y-%m-%d|eval Search2Time=strptime(EventTime,"%Y-%m-%d %H:%M:%S.%NZ")| transaction SessionId | where Id!=" "| stats dc(Id) by Service
Hope this helps!