I am trying to find the best way to identify the event before and after a matched event for each SessionID
Example data;
time | SessionID | UserID | Match | Data
12/08/2018 11:12:27 | 1 | 123 | Y | a
12/08/2018 11:12:28 | 1 | 123 | N | b
12/08/2018 11:12:29 | 2 | 789 | Y | c
12/08/2018 11:12:30 | 1 | 321 | N | d
12/08/2018 11:12:31 | 1 | 321 | Y | e
12/08/2018 11:12:32 | 2 | 987 | N | f
12/08/2018 11:12:33 | 1 | 123 | N | g
12/08/2018 11:12:34 | 1 | 321 | N | h
12/08/2018 11:12:35 | 2 | 987 | N | i
12/08/2018 11:12:36 | 1 | 321 | N | j
12/08/2018 11:12:37 | 1 | 321 | N | k
12/08/2018 11:12:38 | 2 | 987 | Y | l
12/08/2018 11:12:39 | 2 | 789 | N | m
12/08/2018 11:12:40 | 1 | 123 | N | n
12/08/2018 11:12:41 | 1 | 123 | N | o
12/08/2018 11:12:42 | 2 | 789 | N | p
12/08/2018 11:12:43 | 1 | 321 | N | q
12/08/2018 11:12:44 | 1 | 123 | Y | r
And the data i am trying to identify should look like this;
time | SessionID | UserID | Match | Data
12/08/2018 11:12:27 | 1 | 123 | Y | a
12/08/2018 11:12:28 | 1 | 123 | N | b
-------------------------------------------------------
12/08/2018 11:12:29 | 2 | 789 | Y | c
12/08/2018 11:12:32 | 2 | 987 | N | f
-------------------------------------------------------
12/08/2018 11:12:30 | 1 | 321 | N | d
12/08/2018 11:12:31 | 1 | 321 | Y | e
12/08/2018 11:12:33 | 1 | 123 | N | g
-------------------------------------------------------
12/08/2018 11:12:35 | 2 | 987 | N | i
12/08/2018 11:12:38 | 2 | 987 | Y | l
12/08/2018 11:12:39 | 2 | 789 | N | m
-------------------------------------------------------
12/08/2018 11:12:43 | 1 | 321 | N | q
12/08/2018 11:12:44 | 1 | 123 | Y | r
This is a perfect use case for streamstats
, which passes the records in order and performs aggregate commands, including last()
. When combined with current=f
, it can be used to copy information from the prior record, in whatever order the records are coming in. Streamstats supports the by
clause, and in this case you want by SessionID
.
|makeresults|eval mydata="12/08/2018 11:12:27,1,123,Y,a!!!!12/08/2018 11:12:28,1,123,N,b!!!!12/08/2018 11:12:29,2,789,Y,c!!!!12/08/2018 11:12:30,1,321,N,d!!!!12/08/2018 11:12:31,1,321,Y,e!!!!12/08/2018 11:12:32,2,987,N,f!!!!12/08/2018 11:12:33,1,123,N,g!!!!12/08/2018 11:12:34,1,321,N,h!!!!12/08/2018 11:12:35,2,987,N,i!!!!12/08/2018 11:12:36,1,321,N,j!!!!12/08/2018 11:12:37,1,321,N,k!!!!12/08/2018 11:12:38,2,987,Y,l!!!!12/08/2018 11:12:39,2,789,N,m!!!!12/08/2018 11:12:40,1,123,N,n!!!!12/08/2018 11:12:41,1,123,N,o!!!!12/08/2018 11:12:42,2,789,N,p!!!!12/08/2018 11:12:43,1,321,N,q!!!!12/08/2018 11:12:44,1,123,Y,r"|makemv delim="!!!!" mydata |mvexpand mydata | table mydata |rex field=mydata "^(?<time>[^,]*),(?<SessionID>[^,]*),(?<UserID>[^,]*),(?<Match>[^,]*),(?<Data>[^,]*)$" |fields - mydata | eval _time = strptime(time,"%m/%d/%Y %H:%M:%S")| table _time SessionID UserID Match Data
| sort 0 _time
| rename COMMENT as "The above just inputs your data"
| rename COMMENT as "Now we copy info forward, then reverse the order and copy it backward"
| streamstats current=f last(Match) as lastMatch by SessionID
| reverse
| streamstats current=f last(Match) as nextMatch by SessionID
| reverse
| where Match="Y" OR lastMatch="Y" OR nextMatch="Y"
If you wanted a slightly more complicated version that told the next and prior records WHAT RECORD they were being kept because of, then that could be done, too. Just add more last(fieldname) as lastfieldname
and last(fieldname) as nextfieldname
clauses to the two streamstats
commands.
It would be good if the transaction command had the ability to look for a middle event, like startswith or endswith. Format could be something like this;
Feature request EXAMPLE, not a real use of transaction!
| transaction pivot=Match=Y maxbefore=1 maxafter=1 by SessionID
@karlbosanquet,just to understand your requirement, for the third section, on what basis are you selecting user id 123 for session 1?
It is the next event after a match for SessionID 1.