Solved: fill_summary_index dedup issue

SarahBOA · ‎01-14-2013

We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index.

My command: ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword

When I look at the output, I see the following:
*** For saved search 'sumidx_webserver_count_1minute' ***
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'
waiting for job sid = '1358203822.191' ... finished
All scheduled times will be executed.

*** Spawning a total of 6 searches (max 1 concurrent) ***

The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?

Thanks,
Sarah

SarahBOA · ‎01-17-2013

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

View solution in original post

the_wolverine · ‎09-11-2014

If you are using a SH to run the backfill script and your summary indexed data resides on indexers, you will want to use an undocumented (in the help file) option called -nolocal .

./splunk cmd python fill_summary_index.py -dedup true -nolocal true

This tells Splunk to go to the indexers to find the data for deduplication.

wsnyder2 · ‎05-25-2016

Is there any way to "clean" an existing summary index that contains duplicates?

briancronrath · ‎02-02-2017

delete the whole range and then rerun the backfill is probably best unless you can create a custom script that finds all duplicate entries and cleans them. Might be easiest just to delete for the range (can do this by piping a search to the delete command) and rebuild

rakesh_498115 · ‎12-15-2015

option -nolocal true is taking lot to time to execute and its degrading the splunk performance on the server. is there any better way to achieve this. thanks

wsnyder2 · ‎05-25-2016

Me too... I tried this option "-no local true" .... and it did not do anything, after waiting hours.

SarahBOA · ‎01-17-2013

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

wsnyder2 · ‎05-25-2016

you mean this line from fill_summary_index.py , correct? dedupsearch = 'search splunk_server=local index=$index$ $namefield$="$name$" | stats count by $timefield$'

fill_summary_index dedup issue

Index This | What did the zero say to the eight?

Splunk Observability Cloud's AI Assistant in Action Series: Onboarding New Hires & ...

Now Playing: Splunk Education Summer Learning Premieres

Are you a member of the Splunk Community?

fill_summary_index dedup issue

Index This | What did the zero say to the eight?

Splunk Observability Cloud's AI Assistant in Action Series: Onboarding New Hires & ...

Now Playing: Splunk Education Summer Learning Premieres