Solved: How to get compare latest values across an index w...

hummingbird81 · ‎03-12-2025

Hi All, looking for some advice as in how to take the latest values from 2 datasets . We have a base search that pulls user details like name, start_date, end_date, title, location etc from an index =okta.

name	start_date	end_date	title	user_id
John Smith	2021-06-28T23:59:59.00+05:30	2025-06-28T23:59:59.00+05:30	Consultant	001

The above index has the most current data of a user.

Next we have another master lookup file (identities.csv) where we maintain all user details from past few years. This master lookup also contains same fields as the above index. For example:

name	start_date	end_date	title	user_id
John Smith	2021-06-28T23:59:59.00+05:30	2022-06-28T23:59:59.00+05:30	Administrator	001

Notice the end _date and title are different in the lookup.

Below is our current search that compares the 2 datasets. We want it to update the date fields or any other field whichever is the latest but at the moment it does NOT update the fields even if any field like end_date or title is modified under index.

index=okta 
stats latest(_time) as _time,  values(profile.title) as title, values(profile.email) as email, values(profile.startDate) as start_Date,values(profile.endDate) as end_Date, values(profile.Name) as Name by user_id
| append [|inputlookup identities.csv]
|stats latest(_time) as _time,  latest(profile.title) as title, latest(profile.email) as email, latest(profile.startDate) as start_Date,latest(profile.endDate) as end_Date, latest(profile.Name) as Name by user_id
| table Name title start_date end_date user_id

Running the above query still shows the old info which has the old end_date and title even though i am using |stats latest() . Pls advise how to retrieve the latest be it date format or be it string format which is "title"

name	start_date	end_date	title
John Smith	2021-06-28T23:59:59.00+05:30	2022-06-28T23:59:59.00+05:30	Administrator

kiran_panchavat · ‎03-12-2025

@hummingbird81

I tested this using makeresults with dummy data. Copy and paste this query into your Splunk search bar to run it. It doesn’t depend on your actual index or CSV, so it's safe for testing.

Dummy data:-

| makeresults 
| eval _time=strptime("2025-03-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), 
       user_id="001", 
       Name="John Smith", 
       title="Consultant", 
       email="john.smith@example.com", 
       start_Date="2021-06-28T23:59:59.00+05:30", 
       end_Date="2025-06-28T23:59:59.00+05:30", 
       source="okta", 
       mod_time=_time
| fields user_id, Name, title, email, start_Date, end_Date, mod_time, source
| append [
    | makeresults 
    | eval _time=strptime("2022-06-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), 
           user_id="001", 
           Name="John Smith", 
           title="Administrator", 
           email="john.smith@example.com", 
           start_Date="2021-06-28T23:59:59.00+05:30", 
           end_Date="2022-06-28T23:59:59.00+05:30", 
           source="csv", 
           mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time)
    | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source
]
| sort 0 -mod_time
| dedup user_id
| table Name, title, start_Date, end_Date, user_id

You can try this:-

index=okta 
| eval source="okta", mod_time=_time
| fields user_id, Name, title, email, start_Date, end_Date, mod_time, source
| append [ 
    | inputlookup identities.csv 
    | eval source="csv", mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time) 
    | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source 
]
| sort 0 -mod_time  /* Sort by mod_time descending to prioritize latest */
| dedup user_id     /* Keep only the first (latest) record per user_id */
| table Name, title, start_Date, end_Date, user_id

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

View solution in original post

hummingbird81 · ‎03-12-2025

can you pls confirm why are you doing a " mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time)" in the append portion ? What is the need to do a strptime here ?

kiran_panchavat · ‎03-12-2025

@hummingbird81

Why the if(): To handle cases where _time might be missing in the CSV, using end_Date as a fallback.
Why strptime: To convert end_Date from a string to a numeric epoch timestamp for proper sorting.
Why in append: Because the CSV’s timestamp situation is less certain than Okta’s

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

hummingbird81 · ‎03-12-2025

Thank you so much. Let me test it out in detail but it looks promising.

kiran_panchavat · ‎03-12-2025

@hummingbird81 Sounds good! Please test it and let me know. If everything works fine, kindly accept the solution.

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

kiran_panchavat · ‎03-12-2025

@hummingbird81

I tested this using makeresults with dummy data. Copy and paste this query into your Splunk search bar to run it. It doesn’t depend on your actual index or CSV, so it's safe for testing.

Dummy data:-

| makeresults 
| eval _time=strptime("2025-03-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), 
       user_id="001", 
       Name="John Smith", 
       title="Consultant", 
       email="john.smith@example.com", 
       start_Date="2021-06-28T23:59:59.00+05:30", 
       end_Date="2025-06-28T23:59:59.00+05:30", 
       source="okta", 
       mod_time=_time
| fields user_id, Name, title, email, start_Date, end_Date, mod_time, source
| append [
    | makeresults 
    | eval _time=strptime("2022-06-01T12:00:00.00+05:30", "%Y-%m-%dT%H:%M:%S.%2Q%z"), 
           user_id="001", 
           Name="John Smith", 
           title="Administrator", 
           email="john.smith@example.com", 
           start_Date="2021-06-28T23:59:59.00+05:30", 
           end_Date="2022-06-28T23:59:59.00+05:30", 
           source="csv", 
           mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time)
    | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source
]
| sort 0 -mod_time
| dedup user_id
| table Name, title, start_Date, end_Date, user_id

You can try this:-

index=okta 
| eval source="okta", mod_time=_time
| fields user_id, Name, title, email, start_Date, end_Date, mod_time, source
| append [ 
    | inputlookup identities.csv 
    | eval source="csv", mod_time=if(isnull(_time), strptime(end_Date, "%Y-%m-%dT%H:%M:%S.%2Q%z"), _time) 
    | fields user_id, Name, title, email, start_Date, end_Date, mod_time, source 
]
| sort 0 -mod_time  /* Sort by mod_time descending to prioritize latest */
| dedup user_id     /* Keep only the first (latest) record per user_id */
| table Name, title, start_Date, end_Date, user_id

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!

How to get compare latest values across an index with a lookup and display the latest?

eval

stats

table

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES

Are you a member of the Splunk Community?

How to get compare latest values across an index with a lookup and display the latest?

eval

stats

table

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES