<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Which version of dedup is better? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642739#M222620</link>
    <description>&lt;P&gt;I've never seen anything conclusive about whether &lt;FONT face="courier new,courier"&gt;dedup&lt;/FONT&gt; or &lt;FONT face="courier new,courier"&gt;stats&lt;/FONT&gt; is faster.&amp;nbsp; It may depend on other factors.&lt;/P&gt;&lt;P&gt;One significant difference, however, is &lt;FONT face="courier new,courier"&gt;stats&lt;/FONT&gt; is an aggregating command.&amp;nbsp; That means the original events will be lost.&amp;nbsp; Any field not mentioned in the command will be discarded.&amp;nbsp; The output of the &lt;FONT face="courier new,courier"&gt;values&lt;/FONT&gt; function will be a multi-value field, which requires special handling later in the query.&amp;nbsp; This is why I prefer &lt;FONT face="courier new,courier"&gt;dedup&lt;/FONT&gt;.&lt;/P&gt;</description>
    <pubDate>Tue, 09 May 2023 21:44:50 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2023-05-09T21:44:50Z</dc:date>
    <item>
      <title>Which version of dedup is better?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642720#M222610</link>
      <description>&lt;P&gt;A colleague of mine uses the following dedup version:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| strcat entity "-" IP "-" QID "-" Port "-" Tracking_Method "-" Last_Detected Key
| dedup Key&lt;/LI-CODE&gt;&lt;P&gt;And I grew up with&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| dedup entity IP QID Port Tracking_Method Last_Detected &lt;/LI-CODE&gt;&lt;P&gt;One caveat is Tracking_Method doesn't always exist. So which version is better?&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2023 18:35:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642720#M222610</guid>
      <dc:creator>danielbb</dc:creator>
      <dc:date>2023-05-09T18:35:58Z</dc:date>
    </item>
    <item>
      <title>Re: Which version of dedup is better?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642733#M222615</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;If&amp;nbsp;&lt;SPAN&gt;Tracking_Method doesn't exist, I would write this:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;(...)
| stats count values(Tracking_Method) by entity IP QID Port Last_Detected &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;If you put it in the "by clause", it may not present all the desired results.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I believe that the stats offer a slight better performance than dedup. But you can test both options and check the job inspector for time and inspected events vs return events.&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2023 20:36:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642733#M222615</guid>
      <dc:creator>goncalocoelho</dc:creator>
      <dc:date>2023-05-09T20:36:11Z</dc:date>
    </item>
    <item>
      <title>Re: Which version of dedup is better?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642739#M222620</link>
      <description>&lt;P&gt;I've never seen anything conclusive about whether &lt;FONT face="courier new,courier"&gt;dedup&lt;/FONT&gt; or &lt;FONT face="courier new,courier"&gt;stats&lt;/FONT&gt; is faster.&amp;nbsp; It may depend on other factors.&lt;/P&gt;&lt;P&gt;One significant difference, however, is &lt;FONT face="courier new,courier"&gt;stats&lt;/FONT&gt; is an aggregating command.&amp;nbsp; That means the original events will be lost.&amp;nbsp; Any field not mentioned in the command will be discarded.&amp;nbsp; The output of the &lt;FONT face="courier new,courier"&gt;values&lt;/FONT&gt; function will be a multi-value field, which requires special handling later in the query.&amp;nbsp; This is why I prefer &lt;FONT face="courier new,courier"&gt;dedup&lt;/FONT&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2023 21:44:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642739#M222620</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2023-05-09T21:44:50Z</dc:date>
    </item>
    <item>
      <title>Re: Which version of dedup is better?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642782#M222633</link>
      <description>&lt;P&gt;You can transform null into blank string before dedup, like this&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| fillnull value="" Tracking_Method Port
| dedup entity IP QID Port Tracking_Method Last_Detected &lt;/LI-CODE&gt;&lt;P&gt;In theory, comparing a single field is less computation; however, strcat is not a simple task like fillnull. &amp;nbsp;In my unscientific test, they perform about the same. (BTW, Port is likely to be null while Tracking_Method should always have value.)&lt;/P&gt;</description>
      <pubDate>Wed, 10 May 2023 09:44:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Which-version-of-dedup-is-better/m-p/642782#M222633</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2023-05-10T09:44:15Z</dc:date>
    </item>
  </channel>
</rss>

