So I got multiple custom datasources, scripts mainly, which are sending events to Splunk on some schedule/recurrence.
I can distinguish every execution of these sources by either a timestamp, or a custom ID, which gets incremented with every execution which is captured in every event. The events always have a proper host field, which also contributes to the "unique key" of an event with unique ID mentioned beforehand. The hosts are attributed with custom fields, this is the third part of something which could be used as uniqe key. These are always present in the events as long as they apply to a given host, and are no longer present when they don't apply to a host.
An example what I mean (every line is a separate event):
hostID=host1, attributeID=attribute1, customid=customid1
hostID=host1, attributeID=attribute2, customid=customid1
hostID=host2, attributeID=attribute1, customid=customid2
hostID=host1, attributeID=attribute1, customid=customid2
(Because of the _time field, these would appear in Splunk in reverse order obviously)
I want to deduplicate such events to always have the data only from the really last execution of a script. Like, from the above example, I want to have only
host2, attribute1, customid2
host1, attribute1, customid2
If I were to use
| dedup hostID, attributeID, customid
It would yield me
- host1, attribute2, customid1
- host2, attribute1, customid2
- host1, attribute1, customid2
The solution my team came up is using
<base search> | eventstats max(customid) as max_customid by hostID | search customid=max_customid
This pretty much does the thing, but I feel this is really not efficient - what would be the right approach do to this?
===EDIT
One given host has multiple events (with multiple attributes) from the same execution of the script.
A more detailed example, let's say I got these events:
hostID=host1, attributeID=attribute1, customid=customid1
hostID=host1, attributeID=attribute2, customid=customid1
hostID=host2, attributeID=attribute1, customid=customid2
hostID=host1, attributeID=attribute1, customid=customid2
hostID=host1, attributeID=attribute3, customid=customid2
hostID=host1, attributeID=attribute4, customid=customid2
hostID=host2, attributeID=attribute3, customid=customid2
I want to keep the below events:
hostID=host2, attributeID=attribute1, customid=customid2
hostID=host1, attributeID=attribute1, customid=customid2
hostID=host1, attributeID=attribute3, customid=customid2
hostID=host1, attributeID=attribute4, customid=customid2
hostID=host2, attributeID=attribute3, customid=customid2
This is the reason I can't use stats first()
... View more