I have a log table and I need to match up the user_id with potential PRE log-in user_ids
user_id | page_referer | event_time
1199 | blah.com/sign_in | 01/02/2015 1:01:33 am
2244 | blah.com/sign_in | 01/22/2015 7:55:33 am
user_id | page_referer | event_time
2200 | blah.com/before_sign_in | 01/02/2015 1:01:25 am
4488 | blah.com/before_sign_in | 01/22/2015 7:55:12 am
4499 | blah.com/before_sign_in | 01/22/2015 7:55:15 am
so 1199 should matchup with 2200 and 2244 match up with 4488 & 4499 in a result like this:
user_id | other user_id (MV field ok)
1199 | 2200
2244 | 4488, 4499
Part of the problem I'm having is how to construct a subsearch, or join (or appendcols, etc) where I need to use the event_timestamp as a search ( event_timestamp-90 as the lower range and event_timestamp as the upper range).
I could end up with the final result table, or some other join/transaction that can group these pre/post login user_ids.
Here's an alternative approach:
your search for blah.com/sign_in OR blah.com/before_sign_in | transaction maxspan=90s endswith="blah.com/sign_in" startswith="blah.com/before_sign_in"
This may need a "dummy" transaction field, if so you could for example use sourcetype
if all events are from the same sourcetype. This also will not work if you have overlapping transactions, this problem is intrinsic to your log files - you'd need some differentiator such as a session ID to tell overlapping transactions apart. This also may not succeed in adding both pre-events to your login event.
Eww. True time windowing is hard. Bucketing is easier, but loses potential matches across a minute boundary. And, if your logs are dense and full over overlapping events, this is hopeless.
Try something like this:
yoursearch | eval post_login=if(match(page_referer,"blah.com/sign_in "),user_id,null())
| pre_login=if(match(page_referer,"blah.com/before_sign_in "),user_id,null())
| stats values(pre_login) AS pre_login, values(post_login) AS post_login by _time
That makes a ton of assumptions about your data, but if your events are sparse it might be good enough.
Another approach, closer to what you are thinking, is to use localize
& map
. If you have lots of data, this won't scale. If you have lots of overlapping logins, this won't work.
This is a rough example of what to try, I don't use this command, so my syntax may be sketchy.
page_referer ="blah.com/sign_in" | localize timebefore=90s timeafter=0
| map search="search page_referer=* starttimeu=$starttime$ endtimeu=$endtime$ | eval post_login=if(match(page_referer,'blah.com/sign_in '),user_id,null()) | pre_login=if(match(page_referer,'blah.com/before_sign_in '),user_id,null()) | stats values(pre_login) AS pre_login, values(post_login) AS post_login"
That is a pretty ghetto example. Expect to do some debugging. I just hope it gives you some ideas.
I used the map. The key for me was both endtimeu and the fact the map search command fails quietly if you try somelike eval="$myvar$" and you don't escape the quotes: eval=\"$myvar$\" within that map search.