<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Difference between dedup and dc counting? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189946#M54706</link>
    <description>&lt;P&gt;are there any null ORDERIDs?&lt;/P&gt;

&lt;P&gt;Are you choosing same time range for all these searches?&lt;/P&gt;</description>
    <pubDate>Thu, 21 Aug 2014 13:14:52 GMT</pubDate>
    <dc:creator>strive</dc:creator>
    <dc:date>2014-08-21T13:14:52Z</dc:date>
    <item>
      <title>Difference between dedup and dc counting?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189945#M54705</link>
      <description>&lt;P&gt;Searching a table with 252092 events for the number of distinct ORDERID with "dedup" and "dc" I get different results. The following task "(index=swbdlogs sourcetype=shopdownloadlogs) | chart dc(ORDERID)" returns 71908 and the task "(index=swbdlogs sourcetype=shopdownloadlogs) | dedup ORDERID | chart count" returns 66785. In my opinion the resukts should be the same. A sorting by ORDERID gives values in between "(index=swbdlogs sourcetype=shopdownloadlogs) | sort 300000 ORDERID | chart dc(ORDERID)" returns eg. 71383.&lt;BR /&gt;
Which value can I thrust on?&lt;/P&gt;

&lt;P&gt;Splunk 6.1.1 on RHEL&lt;/P&gt;</description>
      <pubDate>Thu, 21 Aug 2014 12:21:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189945#M54705</guid>
      <dc:creator>aan_gst_dk</dc:creator>
      <dc:date>2014-08-21T12:21:27Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between dedup and dc counting?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189946#M54706</link>
      <description>&lt;P&gt;are there any null ORDERIDs?&lt;/P&gt;

&lt;P&gt;Are you choosing same time range for all these searches?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Aug 2014 13:14:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189946#M54706</guid>
      <dc:creator>strive</dc:creator>
      <dc:date>2014-08-21T13:14:52Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between dedup and dc counting?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189947#M54707</link>
      <description>&lt;P&gt;To make things even more interesting you could also do this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;base search | stats count by ORDERID
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and look at the number of rows returned.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Aug 2014 13:18:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189947#M54707</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-08-21T13:18:33Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between dedup and dc counting?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189948#M54708</link>
      <description>&lt;P&gt;To clarify what @Strive says: Are you searching for the exact same period of time? Not like 'last 4 hours', which is essentially a sliding window.&lt;/P&gt;

&lt;P&gt;Have you tested this with &lt;CODE&gt;earliest&lt;/CODE&gt; and &lt;CODE&gt;latest&lt;/CODE&gt;, e.g. &lt;CODE&gt;earliest-3h@h latest=@h&lt;/CODE&gt; to ensure that exact same underlying events are being returned to your calculation?&lt;/P&gt;

&lt;P&gt;/K&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2014 10:20:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189948#M54708</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2014-08-22T10:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between dedup and dc counting?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189949#M54709</link>
      <description>&lt;P&gt;I have actually a case open by splunk, where I have different count of event on the same query, when runing a couple of time... So it could be possible&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2014 12:55:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189949#M54709</guid>
      <dc:creator>sbsbb</dc:creator>
      <dc:date>2014-08-22T12:55:20Z</dc:date>
    </item>
    <item>
      <title>Re: Difference between dedup and dc counting?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189950#M54710</link>
      <description>&lt;P&gt;For me all values can be reliable for two reasons:&lt;BR /&gt;
- Your time range picker is not the same when you execute your different search with both command dc and dedup&lt;BR /&gt;
-your data have been indexed the continuously way (if you continuously indexed data then the indexing  because your data is very big, is very possible that splunk return you the different results)&lt;BR /&gt;
 For the search that follow who are executed in “All time” (note: I don’t continuously index my data); the results is could be normally the same thing with dc and dedup command:&lt;BR /&gt;
1- I have a search (index=tuto sourcetype=access_combined_wcookie) that returns initially 39532 events&lt;BR /&gt;
2- When I execute search “index=tuto sourcetype=access_combined_wcookie | chart dc(categoryId)”,  it returns 39532 events and statistics like this :&lt;/P&gt;

&lt;P&gt;dc(categoryId)&lt;BR /&gt;
8&lt;/P&gt;

&lt;P&gt;This is because the chart command is apply only upon the distinct count of all categoryId in events.&lt;BR /&gt;
3- When I execute “index=tuto sourcetype=access_combined_wcookie | dedup categoryId | chart count”, I obtain 8 events and statistic table that follow:&lt;/P&gt;

&lt;P&gt;count&lt;BR /&gt;
8&lt;/P&gt;

&lt;P&gt;This means that we dedup events based on categoryId criteria before do the count&lt;BR /&gt;
4- When I execute “index=tuto sourcetype=access_combined_wcookie | sort 40000 categoryId |chart dc(categoryId)” I have the same thing with step 2&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 18:28:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Difference-between-dedup-and-dc-counting/m-p/189950#M54710</guid>
      <dc:creator>ngatchasandra</dc:creator>
      <dc:date>2020-09-28T18:28:10Z</dc:date>
    </item>
  </channel>
</rss>

