Hello guys I hope everyone is doing well
I would like to know is there is a way to code in Splunk how to show the earliest and latest event based on a datetime reference. In the Hotel chain where I work we deal with a lot of people who fraudulently create new accounts and use other people's credit cards or names in order to make a reservation and in our system we have a way to set up a "fraud alarm" what I'd like to do is find a way to create a table where I can see the date of the fraud alarm and the very call after and before that fraud alarm, please allow me to illustrate I can have a table like this one:
FRAUD_DATE | ID | RISK |
2021-01-14 15:30:12 | 1 | AGT |
2021-02-03 18:46:59 | 1 | JUI |
2021-02-05 17:16:23 | 1 | JUI |
2021-11-26 21:15:14 | 3 | AGT |
and also a table like this:
ID | RESERVATION | DATE |
1 | UJS01 | 2021-01-13 15:30:12 |
1 | UJS02 | 2021-01-14 16:30:12 |
2 | UJS03 | 2021-02-03 18:46:59 |
1 | UJS04 | 2021-02-06 14:00:23 |
2 | UJS05 | 2021-02-03 18:46:59 |
This will be the table I will be looking for:
all IDs that have made a reservation and also have a fraud alarm and I have the date of the fraud alarm (related to that ID) with the date of the reservation right before the fraud alarm and the date of the reservation right after the fraud alarm
ID | RESERVATION_CODE_BEFORE | DATE_REV_BEFORE | FRAUD_ALARM | DATE_REV_AFTER | RESERVATION_CODE_AFTER |
1 | UJS01 | 2021-01-13 15:30:12 | 2021-01-14 15:30:12 | 2021-01-14 16:30:12 | UJS02 |
1 | UJS02 | 2021-01-14 16:30:12 | 2021-02-03 18:46:59 | 2021-02-06 14:00:23 | UJS04 |
1 | UJS02 | 2021-01-14 16:30:12 | 2021-02-05 17:16:23 | 2021-02-06 14:00:23 | UJS04 |
If there is no prior or post date them N.A will be the best chose to tabulate, thank you so much to anyone that can help me out.
These are the codes that I can use to extract the information:
Fraud Alarms:
index="fraud"
| search "fraud_alarms"=*
| fields ID, DATE_FRAUD, RISK
Reservations
index="cx_reserv"
| search "pay_ok"=*
| fields ID, date_reser, code_rev
Kindly,
Cindy
PS: I was wondering if perhaps one could group all the reservation dates by ID and fraud date and then subtract all reservation dates to the fraud date and create a field with the result named "value" and after that select as the "very before reservation date" the one with the greatest negative value and as the very next reservation date the one with the lowest positive value.. but I dont know is splunk has its own way to do this is a faster way perhaps thank you so much everyone
| makeresults
| eval _raw="FRAUD_DATE ID RISK
2021-01-14 15:30:12 1 AGT
2021-02-03 18:46:59 1 JUI
2021-02-05 17:16:23 1 JUI
2021-11-26 21:15:14 3 AGT"
| multikv forceheader=1
| fields - _* linecount
| append
[| makeresults
| eval _raw="ID RESERVATION DATE
1 UJS01 2021-01-13 15:30:12
1 UJS02 2021-01-14 16:30:12
2 UJS03 2021-02-03 18:46:59
1 UJS04 2021-02-06 14:00:23
2 UJS05 2021-02-03 18:46:59"
| multikv forceheader=1
| fields - _* linecount ]
| eval _time=coalesce(FRAUD_DATE,DATE)
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| sort 0 _time
| eval DATE_REV_BEFORE=DATE
| eval RESERVATION_CODE_BEFORE=RESERVATION
| streamstats last(DATE_REV_BEFORE) as DATE_REV_BEFORE last(RESERVATION_CODE_BEFORE) as RESERVATION_CODE_BEFORE by ID
| where isnotnull(FRAUD_DATE)
| fillnull value="N/A" DATE_REV_BEFORE DATE_REV_AFTER
| table ID RESERVATION_CODE_BEFORE DATE_REV_BEFORE FRAUD_DATE DATE_REV_AFTER RESERVATION_CODE_AFTER
| makeresults
| eval _raw="FRAUD_DATE ID RISK
2021-01-14 15:30:12 1 AGT
2021-02-03 18:46:59 1 JUI
2021-02-05 17:16:23 1 JUI
2021-11-26 21:15:14 3 AGT"
| multikv forceheader=1
| fields - _* linecount
| append
[| makeresults
| eval _raw="ID RESERVATION DATE
1 UJS01 2021-01-13 15:30:12
1 UJS02 2021-01-14 16:30:12
2 UJS03 2021-02-03 18:46:59
1 UJS04 2021-02-06 14:00:23
2 UJS05 2021-02-03 18:46:59"
| multikv forceheader=1
| fields - _* linecount ]
| eval _time=coalesce(FRAUD_DATE,DATE)
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| sort 0 _time
| eval DATE_REV_BEFORE=DATE
| eval RESERVATION_CODE_BEFORE=RESERVATION
| streamstats last(DATE_REV_BEFORE) as DATE_REV_BEFORE last(RESERVATION_CODE_BEFORE) as RESERVATION_CODE_BEFORE by ID
| where isnotnull(FRAUD_DATE)
| fillnull value="N/A" DATE_REV_BEFORE DATE_REV_AFTER
| table ID RESERVATION_CODE_BEFORE DATE_REV_BEFORE FRAUD_DATE DATE_REV_AFTER RESERVATION_CODE_AFTER
Amazing thank you so much my beloved gentleman @ITWhisperer I have a question and I am sorry if I'm taking up your time, is it possible to add a | stats command if I want to later on add some other variable like a comment from the fraud review team? Or what I should do is also append the search for this comments of the review team and then uses the streamstats command? Thank you so much my dear @ITWhisperer from the bottom of my heart I love you
Essentially what is happening here is we are copying the previous date and code and next date and code from the reservation events to the fraud event and then just keeping the fraud events, so any additional fields in the fraud events would be preserved (you would just need to include the fields you wanted). If the comments are from different events in a different index, you would need to be able to correlate the comments with the fraud event and either use append and stats in a similar way or use a join. I hope that makes sense.
@ITWhisperer Got you! Thank you so much
Hello @ITWhisperer I hope you are doing well, I decided to make some modifications and check how to apply it to my data but I realized that when I modified to only show me the very next reservation made after the each fraud date is actually not working for me (I must be doing something wrong because your code works like a charm) I intentionally create a reservation one hour after each fraud date and only use one ID which is (1) but the results are not correct, I use this code:
| makeresults
| eval _raw="FRAUD_DATE ID RISK
2021-01-10 15:30:12 1 AGT
2021-02-03 18:46:59 1 JUI
2021-02-05 17:16:23 1 JUI
2021-11-26 21:15:14 1 AGT"
| multikv forceheader=1
| fields - _* linecount
| append
[| makeresults
| eval _raw="ID RESERVATION DATE
1 UJSXX 2021-01-10 16:30:12
1 UJS02 2021-02-03 19:46:59
1 UJS03 2021-02-05 18:16:23
1 UJS04 2021-11-26 22:15:14
1 UJS05 2021-02-03 18:46:59"
| multikv forceheader=1
| fields - _* linecount ]
| eval time=coalesce(FRAUD_DATE,DATE)
| eval time=strptime(time,"%Y-%m-%d %H:%M:%S")
| eval time=strftime(time,"%Y-%m-%d %H:%M:%S")
| eval _time=time
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| where isnotnull(FRAUD_DATE) AND isnotnull(DATE_REV_AFTER)
| table ID FRAUD_DATE RISK DATE_REV_AFTER RESERVATION_CODE_AFTER
| sort FRAUD_DATE
yields the following:
ID FRAUD_DATE RISK DATE_REV_AFTER RESERVATION_CODE_AFTER
1 2021-02-03 18:46:59 JUI 2021-02-03 19:46:59 UJS02
1 2021-02-05 17:16:23 JUI 2021-02-05 18:16:23 UJS03
1 2021-11-26 21:15:14 AGT 2021-11-26 22:15:14 UJS04
The very first fraud date is
2021-01-10 15:30:12
and this ID made a reservation one hour later with rev code UJSXX, I will be so thankful if you could let me know what would be the proper way to adapt the code to properly show only the very next call after every single one fraud alarm.. if you have time of course, I apologize if I am being too annoying by now..
kindly,
Cindy G.
The "problem" is with the way the data is set up. Sometimes multikv doesn't interpret the data correctly, especially when space/tab delimiters are inferred. Changing the delimiters to commas gives the missing result
| makeresults
| eval _raw="FRAUD_DATE,ID,RISK
2021-01-10 15:30:12,1,AGT
2021-02-03 18:46:59,1,JUI
2021-02-05 17:16:23,1,JUI
2021-11-26 21:15:14,1,AGT"
| multikv forceheader=1
| fields - _* linecount
| append
[| makeresults
| eval _raw="ID,RESERVATION,DATE
1,UJSXX,2021-01-10 16:30:12
1,UJS02,2021-02-03 19:46:59
1,UJS03,2021-02-05 18:16:23
1,UJS04,2021-11-26 22:15:14
1,UJS05,2021-02-03 18:46:59"
| multikv forceheader=1
| fields - _* linecount ]
| eval time=coalesce(FRAUD_DATE,DATE)
| eval time=strptime(time,"%Y-%m-%d %H:%M:%S")
| eval _time=time
| eval time=strftime(time,"%Y-%m-%d %H:%M:%S")
| sort 0 - _time
| eval DATE_REV_AFTER=DATE
| eval RESERVATION_CODE_AFTER=RESERVATION
| streamstats last(DATE_REV_AFTER) as DATE_REV_AFTER last(RESERVATION_CODE_AFTER) as RESERVATION_CODE_AFTER by ID
| where isnotnull(FRAUD_DATE) AND isnotnull(DATE_REV_AFTER)
| table ID FRAUD_DATE RISK DATE_REV_AFTER RESERVATION_CODE_AFTER
| sort FRAUD_DATE
Also, I moved the assignment to _time after the strptime and before the strftime to make it an epoch time value instead of a string. Given the format used, this doesn't actually matter since values in the string represent decreasing magnitudes of times e.g. years before months before days etc., but from a purist point of view, it would be better to sort using the epoch time value rather than the string.
Amazing thank you so much @ITWhisperer from the bottom of my heart