Solved: How to keep all most recent events for a specific ...

mew1033 · ‎11-05-2015

My question is similar to this: https://answers.splunk.com/answers/35759/keping-only-most-recent-events-for-a-fixed-field.html
Basically, I have scan data that looks something like this:

scanIDa,machine1,fail1
scanIDa,machine1,fail2

scanIDb,machine1,fail1
scanIDb,machine1,fail2
scanIDb,machine1,fail3


scanIDc,machine2,fail1
scanIDc,machine2,fail2
scanIDc,machine2,fail3

scanIDd,machine2,fail1
scanIDd,machine2,fail3


scanIDe,machine3,fail1

scanIDf,machine3,fail1
scanIDf,machine3,fail2

I want to keep all the data for only the most recent scan on each machine. So the end result of my search should be something like this:

scanIDa,machine1,fail1
scanIDa,machine1,fail2

scanIDc,machine2,fail1
scanIDc,machine2,fail2
scanIDc,machine2,fail3

scanIDe,machine3,fail1

I don't want to know about fail3 on machine1 anymore because it was fixed in a more recent scan.
scanID is a random value. Looks like an md5 hash or something. Whatever it is, it's not usable as a sort field.

Is this possible? Am I dreaming?

mew1033 · ‎12-01-2015

@lguinn's answer helped me out a little, but it still didn't get me exactly what I wanted.

What I ended up doing was creating a lookup that runs once an hour. It does

searchhere | 
stats latest(scanID) as scanID by machine | eval mostRecent='yes'

Then I can just search for mostRecent='yes' and get the results I want.

View solution in original post

willi358 · ‎08-10-2017

I had the same issue with my Nessus scan data. I solved it using the streamstats command. The following search works for your example:

yoursearch | streamstats first(scanID) as scanID_first by machine | eval recent=if(scanID=scanID_first,"yes","no")

This will make your scan data look as follows:

scanID   machine   fail   scanID_first  recent
scanIDa  machine1  fail1  scanIDa       yes
scanIDa  machine1  fail2  scanIDa       yes
scanIDb  machine1  fail1  scanIDa       no
scanIDb  machine1  fail2  scanIDa       no
scanIDb  machine1  fail3  scanIDa       no
scanIDc  machine2  fail1  scanIDc       yes
scanIDc  machine2  fail2  scanIDc       yes
scanIDc  machine2  fail3  scanIDc       yes
scanIDd  machine2  fail1  scanIDc       no
scanIDd  machine2  fail3  scanIDc       no
scanIDe  machine3  fail1  scanIDe       yes
scanIDf  machine3  fail1  scanIDe       no
scanIDf  machine3  fail2  scanIDe       no

Now you can search for recent="yes".

My case was a little different. The name of the scan (the "name" field) did not change. However, the "scan_start" field (when the scan was started) was different for each scan run. I wanted to keep the scans with the latest value of scan_start. So I used this search:

mysearch | streamstats first(scan_start) as scan_start_first by name | eval recent=if(scan_start=scan_start_first,"yes","no")

mew1033 · ‎12-01-2015

@lguinn's answer helped me out a little, but it still didn't get me exactly what I wanted.

What I ended up doing was creating a lookup that runs once an hour. It does

searchhere | 
stats latest(scanID) as scanID by machine | eval mostRecent='yes'

Then I can just search for mostRecent='yes' and get the results I want.

lguinn2 · ‎11-05-2015

The Splunk dedup command should do what you want.

yoursearchhere
| dedup scanID machineID

dedup preserves the first event it sees for each unique combination of scanID and machineID fields. Since Splunk returns events in reverse time order (newest first), the search results will contain only the most recent event.

There are other ways to approach this as well. The following search may give you more ideas...

yoursearchhere
| stats latest(status) as status list(scanID) as scanIDs dc(scanID) as NumberofScans by machineID

mew1033 · ‎11-05-2015

I don't think that dedup command will work... It'll only keep the first event for every unique combination of scanID and machineID. I want to keep ALL events for the most recent scanID on every machine. In my example result set, wouldn't that throw these out?

 scanIDa,machine1,fail2

 scanIDc,machine2,fail3

I'll look into the stats line though, thanks!

How to keep all most recent events for a specific field and remove all others?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

How to keep all most recent events for a specific field and remove all others?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!