topic fill_summary_index dedup issue in Knowledge Management

fill_summary_index dedup issue

SarahBOA — Mon, 28 Sep 2020 13:06:32 GMT

We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index.

My command: ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword

When I look at the output, I see the following:
*** For saved search 'sumidx_webserver_count_1minute' ***
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'
waiting for job sid = '1358203822.191' ... finished
All scheduled times will be executed.

*** Spawning a total of 6 searches (max 1 concurrent) ***

The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?

Thanks,
Sarah

Re: fill_summary_index dedup issue

SarahBOA — Mon, 28 Sep 2020 13:08:07 GMT

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

Re: fill_summary_index dedup issue

the_wolverine — Thu, 11 Sep 2014 20:52:04 GMT

If you are using a SH to run the backfill script and your summary indexed data resides on indexers, you will want to use an undocumented (in the help file) option called -nolocal .

./splunk cmd python fill_summary_index.py -dedup true -nolocal true

This tells Splunk to go to the indexers to find the data for deduplication.

Re: fill_summary_index dedup issue

rakesh_498115 — Tue, 15 Dec 2015 10:33:34 GMT

option -nolocal true is taking lot to time to execute and its degrading the splunk performance on the server. is there any better way to achieve this. thanks

Re: fill_summary_index dedup issue

wsnyder2 — Tue, 29 Sep 2020 09:45:48 GMT

you mean this line from fill_summary_index.py , correct? dedupsearch = 'search splunk_server=local index=$index$ $namefield$="$name$" | stats count by $timefield$'

Re: fill_summary_index dedup issue

wsnyder2 — Wed, 25 May 2016 20:47:26 GMT

Is there any way to "clean" an existing summary index that contains duplicates?

Re: fill_summary_index dedup issue

wsnyder2 — Wed, 25 May 2016 20:48:34 GMT

Me too... I tried this option "-no local true" .... and it did not do anything, after waiting hours.

Re: fill_summary_index dedup issue

briancronrath — Thu, 02 Feb 2017 18:42:55 GMT

delete the whole range and then rerun the backfill is probably best unless you can create a custom script that finds all duplicate entries and cleans them. Might be easiest just to delete for the range (can do this by piping a search to the delete command) and rebuild