Reporting

Dates and Datetime: How to correlate or reference events based on date time

cindygibbs_08
Communicator

 

Hello guys I hope everyone is doing well

I would like to know is there is a way to code in Splunk how to show the earliest and latest event based on a datetime reference. In the Hotel chain where I work we deal with a lot of people who fraudulently create new accounts and use other people's credit cards or names in order to make a reservation and in our system we have a way to set up a "fraud alarm" what I'd like to do is find a way to create a table where I can see the date of the fraud alarm and the very call after and before that fraud alarm, please allow me to illustrate I can have a table like this one:

FRAUD_DATEIDRISK
2021-01-14 15:30:121AGT
2021-02-03 18:46:59 1JUI
2021-02-05 17:16:23 1JUI
2021-11-26  21:15:143AGT

 

and also a table like this: 

IDRESERVATIONDATE
1UJS012021-01-13 15:30:12
1UJS022021-01-14 16:30:12
2UJS032021-02-03 18:46:59 
1UJS042021-02-06 14:00:23 
2UJS052021-02-03 18:46:59 

 

This will be the table I will be looking for:
all IDs that have made a reservation  and also have a fraud alarm and I have the date of the fraud alarm (related to that ID)  with the date of the reservation right before the fraud alarm and the date of the reservation right after the fraud alarm 

IDRESERVATION_CODE_BEFOREDATE_REV_BEFOREFRAUD_ALARMDATE_REV_AFTERRESERVATION_CODE_AFTER
1UJS012021-01-13 15:30:122021-01-14 15:30:122021-01-14 16:30:12UJS02
1UJS022021-01-14 16:30:122021-02-03 18:46:59 2021-02-06 14:00:23 UJS04
1UJS022021-01-14 16:30:122021-02-05 17:16:23 2021-02-06 14:00:23 UJS04

 

If there is no prior or post date them N.A will be the best chose to tabulate, thank you so much to anyone that can help me out.

These are the codes that I can use to extract the information:

Fraud Alarms:
index="fraud"
| search "fraud_alarms"=*
| fields ID, DATE_FRAUD, RISK

Reservations
index="cx_reserv"
| search "pay_ok"=*
| fields ID, date_reser, code_rev

 

Kindly,

Cindy

PS: I was wondering if perhaps one could group all the reservation dates by ID and fraud date and then subtract all reservation  dates to the fraud date  and create a field with the result named "value" and after that select as the "very before reservation date" the one with the greatest negative value and as the very next reservation date the one with the lowest positive value..  but I dont know is splunk has its own way to do this is a faster way perhaps thank you so much everyone

 

Labels (1)
0 Karma
1 Solution

ITWhisperer
Legend
| makeresults
| eval _raw="FRAUD_DATE	ID	RISK
2021-01-14 15:30:12	1	AGT
2021-02-03 18:46:59 	1	JUI
2021-02-05 17:16:23 	1	JUI
2021-11-26  21:15:14	3	AGT"
| multikv forceheader=1
| fields - _* linecount
| append 
    [| makeresults 
    | eval _raw="ID	RESERVATION	DATE
1	UJS01	2021-01-13 15:30:12
1	UJS02	2021-01-14 16:30:12
2	UJS03	2021-02-03 18:46:59 
1	UJS04	2021-02-06 14:00:23 
2	UJS05	2021-02-03 18:46:59"
    | multikv forceheader=1 
    | fields - _* linecount ]
| eval _time=coalesce(FRAUD_DATE,DATE)
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| sort 0 _time
| eval DATE_REV_BEFORE=DATE
| eval RESERVATION_CODE_BEFORE=RESERVATION
| streamstats last(DATE_REV_BEFORE) as DATE_REV_BEFORE last(RESERVATION_CODE_BEFORE) as RESERVATION_CODE_BEFORE by ID
| where isnotnull(FRAUD_DATE)
| fillnull value="N/A" DATE_REV_BEFORE DATE_REV_AFTER
| table ID RESERVATION_CODE_BEFORE DATE_REV_BEFORE FRAUD_DATE DATE_REV_AFTER RESERVATION_CODE_AFTER

View solution in original post

ITWhisperer
Legend
| makeresults
| eval _raw="FRAUD_DATE	ID	RISK
2021-01-14 15:30:12	1	AGT
2021-02-03 18:46:59 	1	JUI
2021-02-05 17:16:23 	1	JUI
2021-11-26  21:15:14	3	AGT"
| multikv forceheader=1
| fields - _* linecount
| append 
    [| makeresults 
    | eval _raw="ID	RESERVATION	DATE
1	UJS01	2021-01-13 15:30:12
1	UJS02	2021-01-14 16:30:12
2	UJS03	2021-02-03 18:46:59 
1	UJS04	2021-02-06 14:00:23 
2	UJS05	2021-02-03 18:46:59"
    | multikv forceheader=1 
    | fields - _* linecount ]
| eval _time=coalesce(FRAUD_DATE,DATE)
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| sort 0 _time
| eval DATE_REV_BEFORE=DATE
| eval RESERVATION_CODE_BEFORE=RESERVATION
| streamstats last(DATE_REV_BEFORE) as DATE_REV_BEFORE last(RESERVATION_CODE_BEFORE) as RESERVATION_CODE_BEFORE by ID
| where isnotnull(FRAUD_DATE)
| fillnull value="N/A" DATE_REV_BEFORE DATE_REV_AFTER
| table ID RESERVATION_CODE_BEFORE DATE_REV_BEFORE FRAUD_DATE DATE_REV_AFTER RESERVATION_CODE_AFTER

View solution in original post

cindygibbs_08
Communicator

Amazing thank you so much my beloved gentleman @ITWhisperer  I have a question and I am sorry if I'm taking up your time, is it possible to add a | stats command if I want to later on add some other variable like a comment from the fraud review team? Or what I should do is also append the search for this comments of the review team and then uses the streamstats command? Thank you so much my dear @ITWhisperer  from the bottom of my heart I love you

0 Karma

ITWhisperer
Legend

Hi @cindygibbs_08 

Essentially what is happening here is we are copying the previous date and code and next date and code from the reservation events to the fraud event and then just keeping the fraud events, so any additional fields in the fraud events would be preserved (you would just need to include the fields you wanted). If the comments are from different events in a different index, you would need to be able to correlate the comments with the fraud event and either use append and stats in a similar way or use a join. I hope that makes sense.

cindygibbs_08
Communicator

@ITWhisperer Got you! Thank you so much

0 Karma

cindygibbs_08
Communicator

Hello @ITWhisperer  I hope you are doing well, I decided to make some modifications and check how to apply it to my data but I realized that when I modified to only show me the very next reservation made after the each fraud date is actually not working for me (I  must be doing something wrong because your code works like a charm) I intentionally create a reservation one hour after each fraud date and only use one ID which is (1) but the results are not correct, I use this code:

 

 

| makeresults
| eval _raw="FRAUD_DATE	ID	RISK
2021-01-10 15:30:12	    1	AGT
2021-02-03 18:46:59 	1	JUI
2021-02-05 17:16:23 	1	JUI
2021-11-26  21:15:14	1	AGT"
| multikv forceheader=1
| fields - _* linecount
| append 
    [| makeresults 
    | eval _raw="ID	RESERVATION	DATE
1	UJSXX   2021-01-10 16:30:12	
1	UJS02	2021-02-03 19:46:59
1	UJS03	2021-02-05 18:16:23
1	UJS04	2021-11-26 22:15:14
1	UJS05	2021-02-03 18:46:59"
    | multikv forceheader=1 
    | fields - _* linecount ]
    | eval time=coalesce(FRAUD_DATE,DATE)
| eval time=strptime(time,"%Y-%m-%d %H:%M:%S")
| eval time=strftime(time,"%Y-%m-%d %H:%M:%S")
| eval _time=time
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| where isnotnull(FRAUD_DATE) AND isnotnull(DATE_REV_AFTER)
| table ID FRAUD_DATE RISK DATE_REV_AFTER RESERVATION_CODE_AFTER
| sort FRAUD_DATE

 

 

 

yields the following:

 

 

ID	FRAUD_DATE	RISK	 DATE_REV_AFTER	    RESERVATION_CODE_AFTER
1	2021-02-03 18:46:59 	JUI	     2021-02-03 19:46:59	UJS02
1	2021-02-05 17:16:23 	JUI	     2021-02-05 18:16:23	UJS03
1	2021-11-26  21:15:14	AGT	     2021-11-26 22:15:14	UJS04

 

 

 

The very first fraud date is 

2021-01-10 15:30:12	

and this ID made a reservation one hour later with rev code UJSXX, I will be so thankful if you could let me know what would be the proper way to adapt the code to properly show only the very next call after every single one fraud alarm.. if you have time of course, I apologize if I am being too annoying by now..
kindly,

Cindy G.

0 Karma

ITWhisperer
Legend

The "problem" is with the way the data is set up. Sometimes multikv doesn't interpret the data correctly, especially when space/tab delimiters are inferred. Changing the delimiters to commas gives the missing result

| makeresults
| eval _raw="FRAUD_DATE,ID,RISK
2021-01-10 15:30:12,1,AGT
2021-02-03 18:46:59,1,JUI
2021-02-05 17:16:23,1,JUI
2021-11-26 21:15:14,1,AGT"
| multikv forceheader=1
| fields - _* linecount
| append 
    [| makeresults 
    | eval _raw="ID,RESERVATION,DATE
1,UJSXX,2021-01-10 16:30:12	
1,UJS02,2021-02-03 19:46:59
1,UJS03,2021-02-05 18:16:23
1,UJS04,2021-11-26 22:15:14
1,UJS05,2021-02-03 18:46:59"
    | multikv forceheader=1 
    | fields - _* linecount ]
| eval time=coalesce(FRAUD_DATE,DATE)
| eval time=strptime(time,"%Y-%m-%d %H:%M:%S")
| eval _time=time
| eval time=strftime(time,"%Y-%m-%d %H:%M:%S")
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| where isnotnull(FRAUD_DATE) AND isnotnull(DATE_REV_AFTER)
| table ID FRAUD_DATE RISK DATE_REV_AFTER RESERVATION_CODE_AFTER
| sort FRAUD_DATE

Also, I moved the assignment to _time after the strptime and before the strftime to make it an epoch time value instead of a string. Given the format used, this doesn't actually matter since values in the string represent decreasing magnitudes of times e.g. years before months before days etc., but from a purist point of view, it would be better to sort using the epoch time value rather than the string.

cindygibbs_08
Communicator

Amazing thank you so much @ITWhisperer from the bottom of my heart

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.