Splunk Search

Can you help me with dedup and date field. How to get the latest record?

joesrepsolc
Communicator

I figured out how to use the dedup command by the user (see example below) but I still want to get the latest record based on date per user. I've been fumbling around and am obviously missing something with the dedup command or additional commands to achieve this.

STARTING WITH:
USER STATUS DATE
A Active 1/20/2019
A Disabled 10/15/2017
A Active 1/22/2015
B Leave 7/01/2016
B Active 4/01/2013
C Active 6/26/2018
D Active 7/04/2017
E Terminated 6/30/2018
E Active 5/01/2015

WANT THIS BACK (1 row, then latest "date" per user):
USER STATUS DATE
A Active 1/20/2019
B Leave 7/01/2016
C Active 6/26/2018
D Active 7/04/2017
E Terminated 6/30/2018

Tags (2)
1 Solution

woodcock
Esteemed Legend

You just need to sort by DATE first, like this:

Your Base Search Here 
| eval _time = strptime(DATE, "%m/%d/%Y")
| sort 0 - _time
| dedup USER

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

@joesrepsolc Is your problem resolved? If so, please accept an answer to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma

woodcock
Esteemed Legend

You just need to sort by DATE first, like this:

Your Base Search Here 
| eval _time = strptime(DATE, "%m/%d/%Y")
| sort 0 - _time
| dedup USER

joesrepsolc
Communicator

Thank you woodcock... as usual, you are correct. dedup simply returns the top row based on the field you specify. I mistakenly took that as sorting by most recent (_time)... but that is not the case. Good to know.

0 Karma

joesrepsolc
Communicator

Thank you,

Is your suggestion really any different than the previous comment? Just trying to understand the difference (if any). So DEDUP inherently sorts by the event time of the record, so if I "| DEDUP USER" that really should be all I need right to show the most recent log event, per user, and only get (1) record per user right?

Thanks. Joe

0 Karma

woodcock
Esteemed Legend

You are mistaken: dedup does NOT sort under any circumstances. It just walks from the top to the bottom keeping the first one that it finds. Yes, my answer is different and will work. Just try it.

0 Karma

ellothere
Explorer

HI joesrepsolc,

Given your list of names with associated dates, you want a list of users with the most recent date returned.

Based on that, I would suggest something like this:
index=main | dedup USER | stats latest(DATE) by USER

Hope this helps!

0 Karma

joesrepsolc
Communicator

Thank You.

0 Karma

adonio
Ultra Champion

hello there,

here is solution without | dedup. imho | dedup is not ideal for your challenge, more explanation at docs here:
https://docs.splunk.com/Documentation/Splunk/7.2.3/SearchReference/Dedup
and especially this: "... Events returned by dedup are based on search order. For historical searches, the most recent events are searched first ... "
which in your case, you would like to dedup based on value of a field, not order returned

run this search anywhere:

| makeresults count=1
| eval data = "A Active 1/20/2019;A Disabled 10/15/2017;A Active 1/22/2015;B Leave 7/01/2016;B Active 4/01/2013;C Active 6/26/2018;D Active 7/04/2017;E Terminated 6/30/2018;E Active 5/01/2015"
| makemv delim=";" data
| mvexpand data
| rex field=data "(?<USER>[^\s]+)\s(?<STATUS>[^\s]+)\s(?<DATE>[^\s]+)"
| table USER STATUS DATE 
| rename COMMENT as "above generates fake results, belew is your solution"
| rename COMMENT as "here i will use epoch time and the stats max() function to capture last event"
| eval DATE_EPOCH = strptime(DATE, "%m/%d/%Y")
| stats max(DATE_EPOCH) as "Last Status Time" by USER
| eval "Last Status Time" = strftime('Last Status Time', "%m/%d/%Y")

see screenshot:

alt text

hope it helps

joesrepsolc
Communicator

you are correct about the dedup function not really fitting what I'm trying to do. This was surely a setback in me figuring this out.

You example is TREMENDOUS. Fake sample data... i didn't even know that was possible. Awesome. Studying your example code (and REGEX) now. Very helpful. Thank you for your response.

Wow'd

Joe

0 Karma

adonio
Ultra Champion

@joesrepsolc youre very welcome
if it answers te question, kindly accept the answer so others will know it worked for you
you are welcome to upvote any comments that you ind helpful as well

0 Karma

Vijeta
Influencer

Can you please paste your current search query.

Instead of dedup you could use

<yourquery>|stats latest(Date) as Date , latest(Status) as Status by User
0 Karma

joesrepsolc
Communicator

trying this now... wasn't sure I could solve my problem without the use of "dedup". Interesting... running tests now.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I must be missing something in your question because dedup takes the most recent record for each unique user (in your case). What are you seeing that is different from that?

---
If this reply helps you, Karma would be appreciated.
0 Karma

joesrepsolc
Communicator

index=indexname
|dedup user date

0 Karma

joesrepsolc
Communicator

I run that as my search, and I get unique user and unique dates (but still multiple records). If there are multiple edits made on the same day I'm not sure which one is winning. So I want to specify the most recent (by date). Just confusing myself on the search part

0 Karma

Vijeta
Influencer

@joesrepsolc When you use latest it will give you the latest of that field( here date and status) based on the latest timestamp of that event and that would ideally be the most recent one for that user.

0 Karma

joesrepsolc
Communicator

So it uses the timestamp of the EVENT... or the data contents of that FIELD (im my case it is a data/time field). That is an important distinction for me to understand.

Thank You!

0 Karma

Vijeta
Influencer

latest uses timestamp , it will give you most recent date and status based on timestamp for a user.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...