<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Getting a list of unique IDs from a large data set efficiently in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574112#M200077</link>
    <description>&lt;P&gt;I would combine the first and last.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=abc ID=* 
| fields ID 
| stats count by ID&lt;/LI-CODE&gt;&lt;P&gt;If the ID field is indexed then &lt;FONT face="courier new,courier"&gt;tstats&lt;/FONT&gt; would be more efficient.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| tstats count where index=abc by ID&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 09 Nov 2021 01:11:25 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2021-11-09T01:11:25Z</dc:date>
    <item>
      <title>Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574091#M200071</link>
      <description>&lt;P&gt;We have a relatively small set of devices that emit daily in the vicinity of a million events each.&amp;nbsp; Each device has unique ID (Serial #) which is included in events.&lt;/P&gt;&lt;P&gt;What would be an efficient method of collecting a list of unique IDs?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;index=abc | stats count by ID&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;index=abc | stats values(id) as IDs | mvexpand IDs&lt;BR /&gt;&lt;BR /&gt;index-abc | fields ID | dedup ID&lt;/P&gt;&lt;P&gt;Anything else?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Nov 2021 21:55:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574091#M200071</guid>
      <dc:creator>pm771</dc:creator>
      <dc:date>2021-11-08T21:55:28Z</dc:date>
    </item>
    <item>
      <title>Re: Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574112#M200077</link>
      <description>&lt;P&gt;I would combine the first and last.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=abc ID=* 
| fields ID 
| stats count by ID&lt;/LI-CODE&gt;&lt;P&gt;If the ID field is indexed then &lt;FONT face="courier new,courier"&gt;tstats&lt;/FONT&gt; would be more efficient.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| tstats count where index=abc by ID&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 01:11:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574112#M200077</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2021-11-09T01:11:25Z</dc:date>
    </item>
    <item>
      <title>Re: Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574133#M200089</link>
      <description>&lt;P&gt;In terms of efficiency, the stats command is _likely_ to be the most efficient. However, make sure you put as many filter criteria in the initial search as possible. For example if each device produces different types of event and you know it always makes an event with a type=X then include that type filter in the search, so it will not search ALL events produced by the device, only the limited subset.&lt;/P&gt;&lt;P&gt;The job inspector should give you a good idea as to which is the most efficient in your environment.&lt;/P&gt;&lt;P&gt;As&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/213957"&gt;@richgalloway&lt;/a&gt;&amp;nbsp; says, if your ID field is indexed, then tstats will be by far, the most efficient way of collecting the list of ids, at the expense of some extra disk space to index that field for each event.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 06:19:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574133#M200089</guid>
      <dc:creator>bowesmana</dc:creator>
      <dc:date>2021-11-09T06:19:05Z</dc:date>
    </item>
    <item>
      <title>Re: Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574136#M200092</link>
      <description>&lt;P&gt;As usually this depends and the best way to check which one is best for your particular case is to use Job inspector as&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/6367"&gt;@bowesmana&lt;/a&gt;&amp;nbsp;already said. Time by time dedup can be more efficient than stats (which is efficient in most of cases).&lt;/P&gt;&lt;P&gt;r. Ismo&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 06:30:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574136#M200092</guid>
      <dc:creator>isoutamo</dc:creator>
      <dc:date>2021-11-09T06:30:02Z</dc:date>
    </item>
    <item>
      <title>Re: Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574223#M200118</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/213957"&gt;@richgalloway&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I understand &lt;FONT face="andale mono,times" color="#0000FF"&gt;ID=*&lt;/FONT&gt; part.&amp;nbsp; &amp;nbsp;Why would I needs &lt;FONT color="#0000FF"&gt;fields&lt;/FONT&gt; before &lt;FONT color="#0000FF"&gt;stats&lt;/FONT&gt;?&lt;BR /&gt;&lt;BR /&gt;Can you please explain?&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 15:52:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574223#M200118</guid>
      <dc:creator>pm771</dc:creator>
      <dc:date>2021-11-09T15:52:20Z</dc:date>
    </item>
    <item>
      <title>Re: Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574240#M200123</link>
      <description>&lt;P&gt;The &lt;FONT face="courier new,courier"&gt;fields&lt;/FONT&gt; command reduces the amount of data being processed.&amp;nbsp; It probably is not of much benefit in this example, but is something to keep in mind when thinking about performance.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 16:34:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574240#M200123</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2021-11-09T16:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: Getting a list of unique IDs from a large data set efficiently</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574288#M200134</link>
      <description>&lt;P&gt;As&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/213957"&gt;@richgalloway&lt;/a&gt;&amp;nbsp;says, &lt;STRONG&gt;fields&lt;/STRONG&gt; is a useful command, particularly when dealing with large data sets, as it instructs the search to remove unwanted data from the event, thus improving efficiency.&lt;/P&gt;&lt;P&gt;An important point about &lt;STRONG&gt;fields&lt;/STRONG&gt; is that it typically runs on the indexer before the data is returned to a search head, so it can be very important in minimising the data flow through the Splunk environment, therefore improving your search performance, but also having less impact on others' search performance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Nov 2021 21:08:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Getting-a-list-of-unique-IDs-from-a-large-data-set-efficiently/m-p/574288#M200134</guid>
      <dc:creator>bowesmana</dc:creator>
      <dc:date>2021-11-09T21:08:44Z</dc:date>
    </item>
  </channel>
</rss>

