I have an "interesting event," how can I find an event meeting specific criteria that happened before my interesting event, and an event meeting different specific criteria that happened after?
Here is an example:
I have a log that tells me a user visited a specific website. I consider this my interesting event. I also have logs that tell me when the user logged in and logged out, these are the other two events I am looking for. I want to detect anytime a user visits this website, and then put it all together and know when they logged in, visited the site, and logged out.
All events have a field that identifies their Type (Login, Interesting Event, or Logout) and obviously include _time, all three types share the username, and the Login and Logout events also include a "SessionID" field that match the opposing event. Meaning SessionID can be used to find the specific Logout event that corresponds with each Login, and vice versa. With the critical gap being that the Interesting Events do NOT include a SessionID.
Ultimately the goal is to add the SessionID to the interesting event because the visualization I am trying to use (timeline) requires them to share a unique value.
I dont think this by itself would be particularly difficult, as it seems a well designed streamstats command would get the job done. The problem I run into is that users can have multiple sessions that overlap. This means that my logs will not always go in the perfect sequential order of Login --> Interesting Event --> Logout.
And then the issue of having two active sessions when the interesting event occurs, and not really being able to identify in which session it occurred. For my purpose, this does not bother me as long as the interesting event is "assigned" to one of the active sessions.
To continue piling on, it is obviously possible that there is no logout event yet. And unfortunately (and despite logic...) it is possible to not find a login event also. So we need the solution to accommodate the situation where we don't find one of the "book end" events.
Here are some example events:
03/12/2019:18:58:45 TYPE:login USER:DoeJ SESSIONID: 121292
03/12/2019:19:02:25 TYPE:interesting_event USER:DoeJ
03/12/2019:19:28:27 TYPE:login USER:DoeJ SESSIONID: 121484
03/12/2019:19:31:05 TYPE:interesting_event USER:DoeJ
03/12/2019:19:44:12 TYPE:logout USER:DoeJ SESSIONID: 121292
03/12/2019:18:46:41 TYPE:login USER:DoeJ SESSIONID: 122677
03/12/2019:18:47:33 TYPE:logout USER:DoeJ SESSIONID: 122677
03/12/2019:19:48:09 TYPE:interesting_event USER:DoeJ
03/12/2019:20:04:33 TYPE:interesting_event USER:DoeJ
03/12/2019:20:11:54 TYPE:logout USER:DoeJ SESSIONID: 121484
And the goal is to end up with the following information, ultimately to be piped into a table of some sort:
03/12/2019:18:58:45 TYPE:login USER:DoeJ SESSIONID: 121292
03/12/2019:19:02:25 TYPE:interesting_event USER:DoeJ ***SESSIONID: 121292***
03/12/2019:19:28:27 TYPE:login USER:DoeJ SESSIONID: 121484
03/12/2019:19:31:05 TYPE:interesting_event USER:DoeJ ***SESSIONID: 121484***
03/12/2019:19:44:12 TYPE:logout USER:DoeJ SESSIONID: 121292
03/12/2019:18:46:41 TYPE:login USER:DoeJ SESSIONID: 122677
03/12/2019:18:47:33 TYPE:logout USER:DoeJ SESSIONID: 122677
03/12/2019:19:48:09 TYPE:interesting_event USER:DoeJ ***SESSIONID: 121484***
03/12/2019:20:04:33 TYPE:interesting_event USER:DoeJ ***SESSIONID: 121484***
03/12/2019:20:11:54 TYPE:logout USER:DoeJ SESSIONID: 121484
Another way to ask my question is: Anytime I have Event_B happen, how can I find the previous Event_A that does not have a matching Event_C? (Anytime I see my Interesting Event, how can I find the previous Login that doesnt already have a Logout)
I have struggled with this task in one form or another multiple times. I know it is not going to be simple, but even steps in the right direction would be greatly appreciated. Thanks in advance.
When you search, you may get events for multiple users as well? How much data are we dealing with here?
Yes, the base data set has results from multiple users, but I am not sure it matters.
Getting more details on what I am actually trying to do. I am planning to build a visualization (likely a timeline) that splits by user across the entire data set. Condensing all of the activity from one user onto a single line, and a new line for each additional user. This part is done, I don't need the SessionID assigned to the "interesting events" when I mash it all onto one line. The issue is when I want to drill down into a single user, and I want to "expand" their data set. Then I would likely pass the user token to the new panel, another timeline visualization split by SessionID. This is where SessionID becomes important as I need the interesting event to get "placed" on the appropriate row.
As for the amount of data, I dont think we are dealing with so much data that it is a problem. I checked for today, and I only have ~1000 total events across all users and all three event types thus far, and I only foresee this search running latest=@d, not looking to go back weeks or months. I guess this is all relative to the power of my search tier... which is not under powered. Also, in the end I plan to be running this against an accelerated datamodel, while that doesn't always forgive terribly inefficient searches, it can help.