I'm thinking about using the DEDUP
commend to solve the following problem: I have an event with an ID field and I'd like to show only the latest event matching each unique ID value. In most cases all the events with the same ID will be near each other (in time) but they won't be adjacent.
First, is DEDUP the right command to use in this scenario?
Second, how can I ensure that DEDUP (or whatever command is best for this) performs optimally? I have control over all aspects of the data (what fields are emitted, index-time vs. search-time fields, etc.) other than how many events are generated and I have limited control over the size of each event.
Yes, DEDUP
is right. Be sure to use its SORTBY
clause, otherwise it won't do what you want! (And tell SORTBY
whether the ID is str
, num
, or ip
.)
To improve performance, try extracting the ID field, and giving its value to sourcetype
or host
attribute. Then, index as few other fields as possible.
Finally, if you can limit scope of the query time-wise, with earliest
or such, please do that to help performance.
Yes, DEDUP
is right. Be sure to use its SORTBY
clause, otherwise it won't do what you want! (And tell SORTBY
whether the ID is str
, num
, or ip
.)
To improve performance, try extracting the ID field, and giving its value to sourcetype
or host
attribute. Then, index as few other fields as possible.
Finally, if you can limit scope of the query time-wise, with earliest
or such, please do that to help performance.