I've heard this discussion before, and just had a user run a search that is a prime candidate for this so I did some comparing. This 24-hour search covered about 10-15Tb of raw data and returned 62,023 pairs The base search was something like this: index IN (index1,index2,index3)
event=specific_type_auth_event
username IN (user1,user2,username*) This was piped into 3 different options and based on the overall runtime, I'll keep using stats for my deduping. Stats took 67 seconds to run: | stats count by clientip,username
| table clientip,username dedup took 113 seconds | dedup client_ip, username
| table client_ip, username Dedup without the raw field took 97 seconds | fields + username,client_ip
| fields - _raw
| dedup client_ip, username
| table client_ip, username I also used other variations like fields - _* to pull out all internal fields, but it didn't have noticeable effect for stats or dedup.
... View more