Splunk Search

How to keep all most recent events for a specific field and remove all others?

mew1033
Explorer

My question is similar to this: https://answers.splunk.com/answers/35759/keping-only-most-recent-events-for-a-fixed-field.html
Basically, I have scan data that looks something like this:

scanIDa,machine1,fail1
scanIDa,machine1,fail2

scanIDb,machine1,fail1
scanIDb,machine1,fail2
scanIDb,machine1,fail3


scanIDc,machine2,fail1
scanIDc,machine2,fail2
scanIDc,machine2,fail3

scanIDd,machine2,fail1
scanIDd,machine2,fail3


scanIDe,machine3,fail1

scanIDf,machine3,fail1
scanIDf,machine3,fail2

I want to keep all the data for only the most recent scan on each machine. So the end result of my search should be something like this:

scanIDa,machine1,fail1
scanIDa,machine1,fail2

scanIDc,machine2,fail1
scanIDc,machine2,fail2
scanIDc,machine2,fail3

scanIDe,machine3,fail1

I don't want to know about fail3 on machine1 anymore because it was fixed in a more recent scan.
scanID is a random value. Looks like an md5 hash or something. Whatever it is, it's not usable as a sort field.

Is this possible? Am I dreaming?

0 Karma
1 Solution

mew1033
Explorer

@lguinn's answer helped me out a little, but it still didn't get me exactly what I wanted.

What I ended up doing was creating a lookup that runs once an hour. It does

searchhere | 
stats latest(scanID) as scanID by machine | eval mostRecent='yes'

Then I can just search for mostRecent='yes' and get the results I want.

View solution in original post

0 Karma

willi358
Engager

I had the same issue with my Nessus scan data. I solved it using the streamstats command. The following search works for your example:

yoursearch | streamstats first(scanID) as scanID_first by machine | eval recent=if(scanID=scanID_first,"yes","no")

This will make your scan data look as follows:

scanID   machine   fail   scanID_first  recent
scanIDa  machine1  fail1  scanIDa       yes
scanIDa  machine1  fail2  scanIDa       yes
scanIDb  machine1  fail1  scanIDa       no
scanIDb  machine1  fail2  scanIDa       no
scanIDb  machine1  fail3  scanIDa       no
scanIDc  machine2  fail1  scanIDc       yes
scanIDc  machine2  fail2  scanIDc       yes
scanIDc  machine2  fail3  scanIDc       yes
scanIDd  machine2  fail1  scanIDc       no
scanIDd  machine2  fail3  scanIDc       no
scanIDe  machine3  fail1  scanIDe       yes
scanIDf  machine3  fail1  scanIDe       no
scanIDf  machine3  fail2  scanIDe       no 

Now you can search for recent="yes".

My case was a little different. The name of the scan (the "name" field) did not change. However, the "scan_start" field (when the scan was started) was different for each scan run. I wanted to keep the scans with the latest value of scan_start. So I used this search:

mysearch | streamstats first(scan_start) as scan_start_first by name | eval recent=if(scan_start=scan_start_first,"yes","no")
0 Karma

mew1033
Explorer

@lguinn's answer helped me out a little, but it still didn't get me exactly what I wanted.

What I ended up doing was creating a lookup that runs once an hour. It does

searchhere | 
stats latest(scanID) as scanID by machine | eval mostRecent='yes'

Then I can just search for mostRecent='yes' and get the results I want.

0 Karma

lguinn2
Legend

The Splunk dedup command should do what you want.

yoursearchhere
| dedup scanID machineID

dedup preserves the first event it sees for each unique combination of scanID and machineID fields. Since Splunk returns events in reverse time order (newest first), the search results will contain only the most recent event.

There are other ways to approach this as well. The following search may give you more ideas...

yoursearchhere
| stats latest(status) as status list(scanID) as scanIDs dc(scanID) as NumberofScans by machineID
0 Karma

mew1033
Explorer

I don't think that dedup command will work... It'll only keep the first event for every unique combination of scanID and machineID. I want to keep ALL events for the most recent scanID on every machine. In my example result set, wouldn't that throw these out?

 scanIDa,machine1,fail2

 scanIDc,machine2,fail3

I'll look into the stats line though, thanks!

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...