Hi,
I want to use the dedup command with more than one criteria.
First I used | dedup A and had 100 events afterwards.
Then I used | dedup A, B and had 70 events afterwards. In my understanding I the number of events should increase, because I've specified the dedup criteria and less duplicates should be identified?! Am I completely wrong?
Best
Heinz
dedup keepempty=t A B
http://docs.splunk.com/Documentation/Splunk/6.2.2/SearchReference/Dedup
My understanding is that dedup on 3 fields finds all matches on any two of them as duplicates. I will cite my source for that in a moment or just provide the results of a test case in support of that assertion, but I remember learning it in a Splunk course and testing it myself for validation.
A further question regarding the dedup command:
Let's say the fields A & B can appear multiple times in an event.
For example:
Event 1:
A=1
A=2
B=3
B=4
timestamp=X
Event:2
A=1
A=2
B=3
B=4
timestamp=X
Event 3:
A=1
A=2
B=3
B=4
timestamp=Y
| dedup A,B,timestamp
does this include all field values for A & B and results in two remaining events (event 1 and event 3)?
Thanks in advance
Heinz
thanks for confirming!
Yes it gives the value till you have something distinct with the above combination.
Ah, now numbers are changing in the correct direction 🙂
And when I want to ignore events where the dedup criteria don't exist, I can just use
sourcetype=* AND
A=* AND
B=* AND
| dedup A,B
Thanks a lot!
Then that's your problem there. You can do ... | fillnull B | ...
if you want B with an empty value in events that don't have it. That will make dedup work.
Hey Ayn,
yes normally it should exist in all events. Is there a command to find out, whether there are events without the field B and to filter them out?
Edit:
Just tried it out with | sourctype=* AND NOT B= * .
This results in a few events
Does B exist in all your events? IIRC dedup will fail otherwise.