Hello everyone,
I need help with a search.
I have a table with the following fields:
VISITDATE USERNUMBER WEBSITE
4/19/2020 7:15:11 001 www.google.com
4/26/2020 10:24:03 001 www.google.com
4/26/2020 10:33:03 001 www.google.com
4/26/2020 11:15:12 001 www.google.com
4/26/2020 11:30:12 001 www.google.com
4/28/2020 14:30:12 001 www.google.com
I want to count the number of visits (by usernumber and website) that occurred 60 minutes apart, and return the earliest
time in that 60-minute bucket.
VISITDATE USERNUMBER WEBSITE COUNT
4/19/2020 7:15:11 001 www.google.com 1
4/26/2020 10:24:03 001 www.google.com 3
4/26/2020 11:30:12 001 www.google.com 1
4/28/2020 14:30:12 001 www.google.com 1
For example, using bins or buckets with a 60-min/1-hour time span that snap to the hour 10:00 -11:00 is one hour and that would give me this result, which I don't want:
VISITDATE USERNUMBER WEBSITE COUNT
4/19/2020 7:15:11 001 www.google.com 1
4/26/2020 10:24:03 001 www.google.com 2
4/26/2020 11:15:12 001 www.google.com 2
4/28/2020 14:30:12 001 www.google.com 1
Thank you in advance for your help.
| makeresults
| eval _raw="VISITDATE,USERNUMBER,WEBSITE
4/19/2020 7:15:11,001,www.google.com
4/26/2020 10:24:03,001,www.google.com
4/26/2020 10:33:03,001,www.google.com
4/26/2020 11:15:12,001,www.google.com
4/26/2020 11:30:12,001,www.google.com
4/28/2020 14:30:12,001,www.google.com"
| multikv forceheader=1
| eval _time=strptime(VISITDATE,"%m/%d/%Y %T")
| autoregress _time as prev_time
| streamstats count(eval(_time - prev_time > 3600)) as session1
| streamstats min(_time) as session_min by session1
| eval session2=if(_time - session_min <= 3600, "T","N")
| eval tmp=session1.session2
| streamstats dc(tmp) as session3
| stats count as COUNT min(_time) as _time by USERNUMBER WEBSITE session3
| eval VISITDATE=strftime(_time,"%m/%d/%Y %T")
| table VISITDATE,USERNUMBER,WEBSITE,COUNT
| makeresults
| eval _raw="VISITDATE,USERNUMBER,WEBSITE
4/19/2020 7:15:11,001,www.google.com
4/26/2020 10:24:03,001,www.google.com
4/26/2020 10:33:03,001,www.google.com
4/26/2020 11:15:12,001,www.google.com
4/26/2020 11:30:12,001,www.google.com
4/28/2020 14:30:12,001,www.google.com"
| multikv forceheader=1
| eval _time=strptime(VISITDATE,"%m/%d/%Y %T")
| autoregress _time as prev_time
| streamstats count(eval(_time - prev_time > 3600)) as session1
| streamstats min(_time) as session_min by session1
| eval session2=if(_time - session_min <= 3600, "T","N")
| eval tmp=session1.session2
| streamstats dc(tmp) as session3
| stats count as COUNT min(_time) as _time by USERNUMBER WEBSITE session3
| eval VISITDATE=strftime(_time,"%m/%d/%Y %T")
| table VISITDATE,USERNUMBER,WEBSITE,COUNT
Hi,
Thank you so much for your answer. I implemented it on my data and I noticed that there is one thing I should add. There can be multiple users/USERNUMBER so using
|autoregress _time as prev_time
on my data has the effect of using another users previous time instead of the prev_time specific to that user. Is there a way I can make the |autoregress _time as prev_time conditional to the USERNUMBERs so that for example each unique USERNUMBER will have null for the first time they visit a site on the prev_time column. Thank you again for your help.
use streamstats current=f last(_tme) as prev_time by USERNUMBER
That worked perfectly, thank you!