I figured out how to use the dedup command by the user (see example below) but I still want to get the latest record based on date per user. I've been fumbling around and am obviously missing something with the dedup command or additional commands to achieve this.
STARTING WITH:
USER STATUS DATE
A Active 1/20/2019
A Disabled 10/15/2017
A Active 1/22/2015
B Leave 7/01/2016
B Active 4/01/2013
C Active 6/26/2018
D Active 7/04/2017
E Terminated 6/30/2018
E Active 5/01/2015
WANT THIS BACK (1 row, then latest "date" per user):
USER STATUS DATE
A Active 1/20/2019
B Leave 7/01/2016
C Active 6/26/2018
D Active 7/04/2017
E Terminated 6/30/2018
You just need to sort by DATE first, like this:
Your Base Search Here
| eval _time = strptime(DATE, "%m/%d/%Y")
| sort 0 - _time
| dedup USER
@joesrepsolc Is your problem resolved? If so, please accept an answer to help future readers.
You just need to sort by DATE first, like this:
Your Base Search Here
| eval _time = strptime(DATE, "%m/%d/%Y")
| sort 0 - _time
| dedup USER
Thank you woodcock... as usual, you are correct. dedup simply returns the top row based on the field you specify. I mistakenly took that as sorting by most recent (_time)... but that is not the case. Good to know.
Thank you,
Is your suggestion really any different than the previous comment? Just trying to understand the difference (if any). So DEDUP inherently sorts by the event time of the record, so if I "| DEDUP USER" that really should be all I need right to show the most recent log event, per user, and only get (1) record per user right?
Thanks. Joe
You are mistaken: dedup
does NOT sort under any circumstances. It just walks from the top to the bottom keeping the first one that it finds. Yes, my answer is different and will work. Just try it.
HI joesrepsolc,
Given your list of names with associated dates, you want a list of users with the most recent date returned.
Based on that, I would suggest something like this:
index=main | dedup USER | stats latest(DATE) by USER
Hope this helps!
Thank You.
hello there,
here is solution without | dedup
. imho | dedup
is not ideal for your challenge, more explanation at docs here:
https://docs.splunk.com/Documentation/Splunk/7.2.3/SearchReference/Dedup
and especially this: "... Events returned by dedup are based on search order. For historical searches, the most recent events are searched first ... "
which in your case, you would like to dedup based on value of a field, not order returned
run this search anywhere:
| makeresults count=1
| eval data = "A Active 1/20/2019;A Disabled 10/15/2017;A Active 1/22/2015;B Leave 7/01/2016;B Active 4/01/2013;C Active 6/26/2018;D Active 7/04/2017;E Terminated 6/30/2018;E Active 5/01/2015"
| makemv delim=";" data
| mvexpand data
| rex field=data "(?<USER>[^\s]+)\s(?<STATUS>[^\s]+)\s(?<DATE>[^\s]+)"
| table USER STATUS DATE
| rename COMMENT as "above generates fake results, belew is your solution"
| rename COMMENT as "here i will use epoch time and the stats max() function to capture last event"
| eval DATE_EPOCH = strptime(DATE, "%m/%d/%Y")
| stats max(DATE_EPOCH) as "Last Status Time" by USER
| eval "Last Status Time" = strftime('Last Status Time', "%m/%d/%Y")
see screenshot:
hope it helps
you are correct about the dedup function not really fitting what I'm trying to do. This was surely a setback in me figuring this out.
You example is TREMENDOUS. Fake sample data... i didn't even know that was possible. Awesome. Studying your example code (and REGEX) now. Very helpful. Thank you for your response.
Wow'd
Joe
@joesrepsolc youre very welcome
if it answers te question, kindly accept the answer so others will know it worked for you
you are welcome to upvote any comments that you ind helpful as well
Can you please paste your current search query.
Instead of dedup you could use
<yourquery>|stats latest(Date) as Date , latest(Status) as Status by User
trying this now... wasn't sure I could solve my problem without the use of "dedup". Interesting... running tests now.
I must be missing something in your question because dedup
takes the most recent record for each unique user (in your case). What are you seeing that is different from that?
index=indexname
|dedup user date
I run that as my search, and I get unique user and unique dates (but still multiple records). If there are multiple edits made on the same day I'm not sure which one is winning. So I want to specify the most recent (by date). Just confusing myself on the search part
@joesrepsolc When you use latest it will give you the latest of that field( here date and status) based on the latest timestamp of that event and that would ideally be the most recent one for that user.
So it uses the timestamp of the EVENT... or the data contents of that FIELD (im my case it is a data/time field). That is an important distinction for me to understand.
Thank You!
latest uses timestamp , it will give you most recent date and status based on timestamp for a user.