Hi,
I have two source types CardMember_cycle_data (with card member cycle date info) and CardMember_Demographic_data (with card member demographic info).
Both files have more than 3-4 million records each.
(all dates are in MM/DD/YYYY format)
CardMember_cycle_data
CM_id Cycle_Date
CM1 05/01/2023
CM1 06/01/2023
CM2 04/03/2023
CM2 05/03/2023
CM2 06/03/2023
--------------------------
CardMember_Demographic_data
CM_id Transaction_Dt Prod_Code
CM1 01/02/2020 CR
CM1 05/28/2023 XX
CM1 06/07/2023 AB
CM2 04/14/2023 YY
CM2 06/01/2023 CD
My need is -
For each Card Member present in CardMember_cycle_data I need to get the latest Prod_Code as of LATEST Cycle_Date.
Hence the output will be:
CardMember Latest_Cycle_Date Prod_Code
CM1 06/01/2023 XX
CM2 06/03/2023 CD
Judging by your expected output, you want the last product code for the member as of the latest date in the cycle data i.e. not the latest product code for the member if it is after the last cycle date.
In order to do date comparisons, you will need to parse the date strings into internal date format. If you search both data sources at the same time (or append one search after the other), you can do something like this:
| eval _time=coalesce(strptime(Cycle_Date,"%m/%d/%Y"), strptime(Transaction_Dt,"%m/%d/%Y"))
| sort _time
| streamstats latest(Prod_Code) as Prod_Code by CM_id
| where isnotnull(Cycle_Date)
| stats latest(Cycle_Date) as Cycle_Date latest(Prod_Code) as Prod_Code by CM_id
Note that this may not quite work if the date are the same in the two sources