Hello everyone.
I'm dealing with a query that deals with certain "tickets" and "events", but some of them are duplicates, that's why it runs a dedup command. But there seems to be something else happening.
The query is of the form:
index=main source=...
...
...
| fillnull value="[empty]"
| search tickets=***
| dedup tickets
| stats count by name, tickets
| stats sum(count) as numOfTickets by name
...
| fields name, tickets, count
Listing all the events, I'm able to see that the, basically, the main duplicate events are the ones that were null and were filled with "[empty]". But, for some reason, some of the events disappear with dedup.
In theory, dedup should remove all duplicates and maintain one, representing all of its "copies". And that happens for some "names", but not for all. During the same query, I deal with events of the category "name1" and events of the category "name2". All of theirs instances are "[empty]", and running dedup removes all instances of "name1" and maintains one of "name2", when it should maintain one of both.
Why is that happening?
Each instance is of the form
" processTime | arrivalTime | name | tickets | count"
Hi @pedropiin ,
the stats command automatically dedups values, so you don't need to use the dedup command before the stats command.
Ciao.
Giuseppe
The dedup command keeps the first event it finds for each unique value of the field(s) specified in its arguments ("tickets" in this case). The values of other fields are ignored. Depending on the sequence of events, it's entirely possible for each ticket value to come first from name1 and be retained and other names will be discarded.
If you need to dedup on both tickets and name then use dedup tickets name in the query.