I have a query like this:
The query displays the result like this:
I want it to have a result like this:
Example:
Jakarta - Bogor
(LOG_DATE_TIME - string type data) 2022-10-13 09:03:33.539 - 2022-10-13 09:00:55.885
(already converted in timestamps version) 1665626613.539000 - 1665626455.885000
= 158 (in seconds)
then 158/60 = 2.633 minutes
Bogor - Jakarta = 9.22 minutes
Jakarta - Bogor = 360 minutes
Bogor - Jakarta = 240 minutes
How should my query be, in order to achieve that result?
The requirements is very unclear because there is no explanation of the data. If I have to speculate, fields CIF and USER_CD uniquely identifies a certain user activity (let's call it a transaction); each event has a SRC_IP field, and a LOG_DATE_TIME field that is a timestamp in "%Y-%m-%d %H:%M:%S.%3N" format. You want to reveal some characteristics of this transaction using geolocation and lapsed time between mapped geolocation.
If this is your requirement, the mockup display is perhaps not the best format. But set that aside, the stats you illustrated will only give you the total lapsed time during this "transaction", not the lapse between cities you wanted. To do that, you must calculate the characteristics between each event pair.
| dbxquery connection=xxxxx query="select xxx FROM xxx WHERE xxx and to_char(LOG_DATE_TIME,'YYYY-MM-DD')='2022-10-13'"
| iplocation SRC_IP
| stats count values(LOG_DATE_TIME) as TIME dc(City) as countCity list(City) as city values(SRC_IP) as sourceIp by CIF USER_CD
| eval time = strptime(TIME,"%Y-%m-%d %H:%M:%S.%3N")
| eval differenceHours = (max(time) - min(time)) / 3600
| search countCity>1 AND differenceHours>1
| eval iter = mvrange(1, count)
| eval differenceMinutes = mvmap(iter, (mvindex(time, iter) - mvindex(time, iter - 1))/60)
| fields - time
Note you didn't specify the definition of differenceHours, so I have to invent one, and that leads me to move the search command up. Most importantly, the above is not exactly what you expect, to calculate lapsed time only when city changes. I can think of a relatively simple way to achieve that if each city only encompass a maximum of two events. But if there can be more, aggregation will need more refinements. The above should point you to the right direction, however.
CIF and USER_CD uniquely identifies a certain user activity v TRUE
value of LOG_DATE_TIME correlated with City
are successive real-time events.
the differenceHours field will be deprecated because it only takes the earliest and last times.
my goal is to calculate the time difference obtained by differentiating each location change, with units of minutes, which will be entered into the differenceMinutes field.
One more ambiguity: differenceMinutes can be defined as the time lapse between the first event in two consecutive geolocations, or it can be the time lapse between the last event in the last geolocation and the first event of the subsequent geolocation. I will take the latter assumption, the calculation of which is considerably more complex. (If the former, some form of dedup will suffice.)
| dbxquery connection=xxxxx query="select xxx FROM xxx WHERE xxx and to_char(LOG_DATE_TIME,'YYYY-MM-DD')='2022-10-13'"
| iplocation SRC_IP
| stats count values(LOG_DATE_TIME) as TIME dc(City) as countCity list(City) as city values(SRC_IP) as sourceIp by CIF USER_CD ``` calculate sequence ```
| eval time = strptime(TIME,"%Y-%m-%d %H:%M:%S.%3N")
| eval differenceHours = (max(time) - min(time)) / 3600
| search countCity>1 AND differenceHours>1
| eval iter = mvrange(1, count) ``` iter is the iterator of each event ```
| eval seq = 1 ``` seq is the sequence number of each new location ```
| eval seq = mvmap(iter, if(mvindex(city, iter) == mvindex(city, iter - 1), seq, seq + 1))
| stats min(time) as time max(time) as time_exit values(sourceIp) as sourceIp by CIF USER_CD seq city ``` extract entrance and exit times ```
| eval sourceIp = mvjoin(sourceIp, ", ") ``` to help with display ```
| stats values(time) as time list(time_exit) as time_exit list(seq) as seq list(city) as city list(sourceIp) as sourceIp by CIF USER_CD
``` use seq as the new iterator ```
| eval differenceMinutes = mvmap(seq, (mvindex(time, seq) - mvindex(time, seq - 1) + mvindex(time_exit, seq - 1))/60)
``` difference is caculated between exit time and next entrance time ```
| fields - time* iter seq
Hope the comments will give you some idea.
| eval iter = mvrange(1, count) ``` iter is the iterator of each event ```
bro, what is the count function inside mvrange(1, count)?
i need to understand this query
After the previous stats, count in this context is just the result from that stats function named count. This is a roundabout way to say in mvrange(1, count), count is merely a field name.