My question is similar to this: https://answers.splunk.com/answers/35759/keping-only-most-recent-events-for-a-fixed-field.html
Basically, I have scan data that looks something like this:
scanIDa,machine1,fail1
scanIDa,machine1,fail2
scanIDb,machine1,fail1
scanIDb,machine1,fail2
scanIDb,machine1,fail3
scanIDc,machine2,fail1
scanIDc,machine2,fail2
scanIDc,machine2,fail3
scanIDd,machine2,fail1
scanIDd,machine2,fail3
scanIDe,machine3,fail1
scanIDf,machine3,fail1
scanIDf,machine3,fail2
I want to keep all the data for only the most recent scan on each machine. So the end result of my search should be something like this:
scanIDa,machine1,fail1
scanIDa,machine1,fail2
scanIDc,machine2,fail1
scanIDc,machine2,fail2
scanIDc,machine2,fail3
scanIDe,machine3,fail1
I don't want to know about fail3 on machine1 anymore because it was fixed in a more recent scan.
scanID is a random value. Looks like an md5 hash or something. Whatever it is, it's not usable as a sort field.
Is this possible? Am I dreaming?
@lguinn's answer helped me out a little, but it still didn't get me exactly what I wanted.
What I ended up doing was creating a lookup that runs once an hour. It does
searchhere |
stats latest(scanID) as scanID by machine | eval mostRecent='yes'
Then I can just search for mostRecent='yes' and get the results I want.
I had the same issue with my Nessus scan data. I solved it using the streamstats command. The following search works for your example:
yoursearch | streamstats first(scanID) as scanID_first by machine | eval recent=if(scanID=scanID_first,"yes","no")
This will make your scan data look as follows:
scanID machine fail scanID_first recent
scanIDa machine1 fail1 scanIDa yes
scanIDa machine1 fail2 scanIDa yes
scanIDb machine1 fail1 scanIDa no
scanIDb machine1 fail2 scanIDa no
scanIDb machine1 fail3 scanIDa no
scanIDc machine2 fail1 scanIDc yes
scanIDc machine2 fail2 scanIDc yes
scanIDc machine2 fail3 scanIDc yes
scanIDd machine2 fail1 scanIDc no
scanIDd machine2 fail3 scanIDc no
scanIDe machine3 fail1 scanIDe yes
scanIDf machine3 fail1 scanIDe no
scanIDf machine3 fail2 scanIDe no
Now you can search for recent="yes".
My case was a little different. The name of the scan (the "name" field) did not change. However, the "scan_start" field (when the scan was started) was different for each scan run. I wanted to keep the scans with the latest value of scan_start. So I used this search:
mysearch | streamstats first(scan_start) as scan_start_first by name | eval recent=if(scan_start=scan_start_first,"yes","no")
@lguinn's answer helped me out a little, but it still didn't get me exactly what I wanted.
What I ended up doing was creating a lookup that runs once an hour. It does
searchhere |
stats latest(scanID) as scanID by machine | eval mostRecent='yes'
Then I can just search for mostRecent='yes' and get the results I want.
The Splunk dedup
command should do what you want.
yoursearchhere
| dedup scanID machineID
dedup
preserves the first event it sees for each unique combination of scanID and machineID fields. Since Splunk returns events in reverse time order (newest first), the search results will contain only the most recent event.
There are other ways to approach this as well. The following search may give you more ideas...
yoursearchhere
| stats latest(status) as status list(scanID) as scanIDs dc(scanID) as NumberofScans by machineID
I don't think that dedup command will work... It'll only keep the first event for every unique combination of scanID and machineID. I want to keep ALL events for the most recent scanID on every machine. In my example result set, wouldn't that throw these out?
scanIDa,machine1,fail2
scanIDc,machine2,fail3
I'll look into the stats line though, thanks!