Splunk Search

Dedup removing all elements

pedropiin
Path Finder

Hello everyone. 

I'm dealing with a query that deals with certain "tickets" and "events", but some of them are duplicates, that's why it runs a dedup command. But there seems to be something else happening.

The query is of the form:

index=main source=...
...
...
| fillnull value="[empty]"
| search tickets=***
| dedup tickets
| stats count by name, tickets
| stats sum(count) as numOfTickets by name
...
| fields name, tickets, count

Listing all the events, I'm able to see that the, basically, the main duplicate events are the ones that were null and were filled with "[empty]". But, for some reason, some of the events disappear with dedup.

In theory, dedup should remove all duplicates and maintain one, representing all of its "copies". And that happens for some "names", but not for all. During the same query, I deal with events of the category "name1" and events of the category "name2". All of theirs instances are "[empty]", and running dedup removes all instances of "name1" and maintains one of "name2", when it should maintain one of both. 

Why is that happening?

Each instance is of the form
" processTime | arrivalTime | name | tickets | count"

 

Labels (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @pedropiin ,

the stats command automatically dedups values, so you don't need to use the dedup command before the stats command.

Ciao.

Giuseppe

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The dedup command keeps the first event it finds for each unique value of the field(s) specified in its arguments ("tickets" in this case). The values of other fields are ignored. Depending on the sequence of events, it's entirely possible for each ticket value to come first from name1 and be retained and other names will be discarded.

If you need to dedup on both tickets and name then use dedup tickets name in the query.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

New Year, New Changes for Splunk Certifications

As we embrace a new year, we’re making a small but important update to the Splunk Certification ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...