Hi All, looking for some advice as in how to take the latest values from 2 datasets . We have a base search that pulls user details like name, start_date, end_date, title, location etc from an index =okta.
name | start_date | end_date | title | user_id |
John Smith | 2021-06-28T23:59:59.00+05:30 | 2025-06-28T23:59:59.00+05:30 | Consultant | 001 |
The above index has the most current data of a user.
Next we have another master lookup file (identities.csv) where we maintain all user details from past few years. This master lookup also contains same fields as the above index. For example:
name | start_date | end_date | title | user_id |
John Smith | 2021-06-28T23:59:59.00+05:30 | 2022-06-28T23:59:59.00+05:30 | Administrator | 001 |
Notice the end _date and title are different in the lookup.
Below is our current search that compares the 2 datasets. We want it to update the date fields or any other field whichever is the latest but at the moment it does NOT update the fields even if any field like end_date or title is modified under index.
index=okta
stats latest(_time) as _time, values(profile.title) as title, values(profile.email) as email, values(profile.startDate) as start_Date,values(profile.endDate) as end_Date, values(profile.Name) as Name by user_id
| append [|inputlookup identities.csv]
|stats latest(_time) as _time, latest(profile.title) as title, latest(profile.email) as email, latest(profile.startDate) as start_Date,latest(profile.endDate) as end_Date, latest(profile.Name) as Name by user_id
| table Name title start_date end_date user_id
Running the above query still shows the old info which has the old end_date and title even though i am using |stats latest() . Pls advise how to retrieve the latest be it date format or be it string format which is "title"
name | start_date | end_date | title |
John Smith | 2021-06-28T23:59:59.00+05:30 | 2022-06-28T23:59:59.00+05:30 | Administrator |
I tested this using makeresults with dummy data. Copy and paste this query into your Splunk search bar to run it. It doesn’t depend on your actual index or CSV, so it's safe for testing.
Dummy data:-
| makeresults | eval _time=strptime("2025-03-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), user_id="001", Name="John Smith", title="Consultant", email="john.smith@example.com", start_Date="2021-06-28T23:59:59.00+05:30", end_Date="2025-06-28T23:59:59.00+05:30", source="okta", mod_time=_time | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source | append [ | makeresults | eval _time=strptime("2022-06-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), user_id="001", Name="John Smith", title="Administrator", email="john.smith@example.com", start_Date="2021-06-28T23:59:59.00+05:30", end_Date="2022-06-28T23:59:59.00+05:30", source="csv", mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time) | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source ] | sort 0 -mod_time | dedup user_id | table Name, title, start_Date, end_Date, user_id
You can try this:-
index=okta | eval source="okta", mod_time=_time | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source | append [ | inputlookup identities.csv | eval source="csv", mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time) | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source ] | sort 0 -mod_time /* Sort by mod_time descending to prioritize latest */ | dedup user_id /* Keep only the first (latest) record per user_id */ | table Name, title, start_Date, end_Date, user_id
can you pls confirm why are you doing a " mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time)" in the append portion ? What is the need to do a strptime here ?
Thank you so much. Let me test it out in detail but it looks promising.
@hummingbird81 Sounds good! Please test it and let me know. If everything works fine, kindly accept the solution.
I tested this using makeresults with dummy data. Copy and paste this query into your Splunk search bar to run it. It doesn’t depend on your actual index or CSV, so it's safe for testing.
Dummy data:-
| makeresults | eval _time=strptime("2025-03-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), user_id="001", Name="John Smith", title="Consultant", email="john.smith@example.com", start_Date="2021-06-28T23:59:59.00+05:30", end_Date="2025-06-28T23:59:59.00+05:30", source="okta", mod_time=_time | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source | append [ | makeresults | eval _time=strptime("2022-06-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), user_id="001", Name="John Smith", title="Administrator", email="john.smith@example.com", start_Date="2021-06-28T23:59:59.00+05:30", end_Date="2022-06-28T23:59:59.00+05:30", source="csv", mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time) | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source ] | sort 0 -mod_time | dedup user_id | table Name, title, start_Date, end_Date, user_id
You can try this:-
index=okta | eval source="okta", mod_time=_time | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source | append [ | inputlookup identities.csv | eval source="csv", mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time) | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source ] | sort 0 -mod_time /* Sort by mod_time descending to prioritize latest */ | dedup user_id /* Keep only the first (latest) record per user_id */ | table Name, title, start_Date, end_Date, user_id