Splunk Search

Difference between dedup and dc counting?

aan_gst_dk
New Member

Searching a table with 252092 events for the number of distinct ORDERID with "dedup" and "dc" I get different results. The following task "(index=swbdlogs sourcetype=shopdownloadlogs) | chart dc(ORDERID)" returns 71908 and the task "(index=swbdlogs sourcetype=shopdownloadlogs) | dedup ORDERID | chart count" returns 66785. In my opinion the resukts should be the same. A sorting by ORDERID gives values in between "(index=swbdlogs sourcetype=shopdownloadlogs) | sort 300000 ORDERID | chart dc(ORDERID)" returns eg. 71383.
Which value can I thrust on?

Splunk 6.1.1 on RHEL

Tags (1)
0 Karma

ngatchasandra
Builder

For me all values can be reliable for two reasons:
- Your time range picker is not the same when you execute your different search with both command dc and dedup
-your data have been indexed the continuously way (if you continuously indexed data then the indexing because your data is very big, is very possible that splunk return you the different results)
For the search that follow who are executed in “All time” (note: I don’t continuously index my data); the results is could be normally the same thing with dc and dedup command:
1- I have a search (index=tuto sourcetype=access_combined_wcookie) that returns initially 39532 events
2- When I execute search “index=tuto sourcetype=access_combined_wcookie | chart dc(categoryId)”, it returns 39532 events and statistics like this :

dc(categoryId)
8

This is because the chart command is apply only upon the distinct count of all categoryId in events.
3- When I execute “index=tuto sourcetype=access_combined_wcookie | dedup categoryId | chart count”, I obtain 8 events and statistic table that follow:

count
8

This means that we dedup events based on categoryId criteria before do the count
4- When I execute “index=tuto sourcetype=access_combined_wcookie | sort 40000 categoryId |chart dc(categoryId)” I have the same thing with step 2

0 Karma

sbsbb
Builder

I have actually a case open by splunk, where I have different count of event on the same query, when runing a couple of time... So it could be possible

0 Karma

kristian_kolb
Ultra Champion

To clarify what @Strive says: Are you searching for the exact same period of time? Not like 'last 4 hours', which is essentially a sliding window.

Have you tested this with earliest and latest, e.g. earliest-3h@h latest=@h to ensure that exact same underlying events are being returned to your calculation?

/K

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

To make things even more interesting you could also do this:

base search | stats count by ORDERID

and look at the number of rows returned.

0 Karma

strive
Influencer

are there any null ORDERIDs?

Are you choosing same time range for all these searches?

Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...