<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic fill_summary_index dedup issue in Knowledge Management</title>
    <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87192#M892</link>
    <description>&lt;P&gt;We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index. &lt;/P&gt;

&lt;P&gt;My command:  ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword&lt;/P&gt;

&lt;P&gt;When I look at the output, I see the following:&lt;BR /&gt;
*** For saved search 'sumidx_webserver_count_1minute' ***&lt;BR /&gt;
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'&lt;BR /&gt;
  waiting for job sid = '1358203822.191'  ... finished&lt;BR /&gt;
All scheduled times will be executed.&lt;/P&gt;

&lt;P&gt;*** Spawning a total of 6 searches (max 1 concurrent) ***&lt;/P&gt;

&lt;P&gt;The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
Sarah&lt;/P&gt;</description>
    <pubDate>Mon, 28 Sep 2020 13:06:32 GMT</pubDate>
    <dc:creator>SarahBOA</dc:creator>
    <dc:date>2020-09-28T13:06:32Z</dc:date>
    <item>
      <title>fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87192#M892</link>
      <description>&lt;P&gt;We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index. &lt;/P&gt;

&lt;P&gt;My command:  ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword&lt;/P&gt;

&lt;P&gt;When I look at the output, I see the following:&lt;BR /&gt;
*** For saved search 'sumidx_webserver_count_1minute' ***&lt;BR /&gt;
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'&lt;BR /&gt;
  waiting for job sid = '1358203822.191'  ... finished&lt;BR /&gt;
All scheduled times will be executed.&lt;/P&gt;

&lt;P&gt;*** Spawning a total of 6 searches (max 1 concurrent) ***&lt;/P&gt;

&lt;P&gt;The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
Sarah&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 13:06:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87192#M892</guid>
      <dc:creator>SarahBOA</dc:creator>
      <dc:date>2020-09-28T13:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87193#M893</link>
      <description>&lt;P&gt;To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 13:08:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87193#M893</guid>
      <dc:creator>SarahBOA</dc:creator>
      <dc:date>2020-09-28T13:08:07Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87194#M894</link>
      <description>&lt;P&gt;If you are using a SH to run the backfill script and your summary indexed data resides on indexers, you will want to use an undocumented (in the help file) option called -nolocal .&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;./splunk cmd python fill_summary_index.py -dedup true -nolocal true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This tells Splunk to go to the indexers to find the data for deduplication.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Sep 2014 20:52:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87194#M894</guid>
      <dc:creator>the_wolverine</dc:creator>
      <dc:date>2014-09-11T20:52:04Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87195#M895</link>
      <description>&lt;P&gt;option -nolocal true is taking lot to time to execute and its degrading the splunk performance on the server. is there  any better way to achieve this. thanks&lt;/P&gt;</description>
      <pubDate>Tue, 15 Dec 2015 10:33:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87195#M895</guid>
      <dc:creator>rakesh_498115</dc:creator>
      <dc:date>2015-12-15T10:33:34Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87196#M896</link>
      <description>&lt;P&gt;you mean this line from fill_summary_index.py , correct?   dedupsearch = 'search splunk_server=local index=$index$ $namefield$="$name$" | stats count by $timefield$'&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 09:45:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87196#M896</guid>
      <dc:creator>wsnyder2</dc:creator>
      <dc:date>2020-09-29T09:45:48Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87197#M897</link>
      <description>&lt;P&gt;Is there any way to "clean" an existing summary index that contains duplicates?   &lt;/P&gt;</description>
      <pubDate>Wed, 25 May 2016 20:47:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87197#M897</guid>
      <dc:creator>wsnyder2</dc:creator>
      <dc:date>2016-05-25T20:47:26Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87198#M898</link>
      <description>&lt;P&gt;Me too... I tried this option   "-no local true" ....  and it did not do anything, after waiting hours. &lt;/P&gt;</description>
      <pubDate>Wed, 25 May 2016 20:48:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87198#M898</guid>
      <dc:creator>wsnyder2</dc:creator>
      <dc:date>2016-05-25T20:48:34Z</dc:date>
    </item>
    <item>
      <title>Re: fill_summary_index dedup issue</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87199#M899</link>
      <description>&lt;P&gt;delete the whole range and then rerun the backfill is probably best unless you can create a custom script that finds all duplicate entries and cleans them.  Might be easiest just to delete for the range (can do this by piping a search to the delete command) and rebuild&lt;/P&gt;</description>
      <pubDate>Thu, 02 Feb 2017 18:42:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/fill-summary-index-dedup-issue/m-p/87199#M899</guid>
      <dc:creator>briancronrath</dc:creator>
      <dc:date>2017-02-02T18:42:55Z</dc:date>
    </item>
  </channel>
</rss>

