Splunk Search

Which version of dedup is better?

danielbb
Motivator

A colleague of mine uses the following dedup version:

| strcat entity "-" IP "-" QID "-" Port "-" Tracking_Method "-" Last_Detected Key
| dedup Key

And I grew up with

| dedup entity IP QID Port Tracking_Method Last_Detected 

One caveat is Tracking_Method doesn't always exist. So which version is better?

Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

You can transform null into blank string before dedup, like this

| fillnull value="" Tracking_Method Port
| dedup entity IP QID Port Tracking_Method Last_Detected 

In theory, comparing a single field is less computation; however, strcat is not a simple task like fillnull.  In my unscientific test, they perform about the same. (BTW, Port is likely to be null while Tracking_Method should always have value.)

0 Karma

goncalocoelho
Path Finder

Hi,

If Tracking_Method doesn't exist, I would write this:

(...)
| stats count values(Tracking_Method) by entity IP QID Port Last_Detected 

 If you put it in the "by clause", it may not present all the desired results.

 

I believe that the stats offer a slight better performance than dedup. But you can test both options and check the job inspector for time and inspected events vs return events.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I've never seen anything conclusive about whether dedup or stats is faster.  It may depend on other factors.

One significant difference, however, is stats is an aggregating command.  That means the original events will be lost.  Any field not mentioned in the command will be discarded.  The output of the values function will be a multi-value field, which requires special handling later in the query.  This is why I prefer dedup.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...