Knowledge Management

fill_summary_index dedup issue

Path Finder

We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index.

My command: ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword

When I look at the output, I see the following:
*** For saved search 'sumidx_webserver_count_1minute' ***
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'
waiting for job sid = '1358203822.191' ... finished
All scheduled times will be executed.

*** Spawning a total of 6 searches (max 1 concurrent) ***

The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?

Thanks,
Sarah

1 Solution

Path Finder

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

View solution in original post

Champion

If you are using a SH to run the backfill script and your summary indexed data resides on indexers, you will want to use an undocumented (in the help file) option called -nolocal .

./splunk cmd python fill_summary_index.py -dedup true -nolocal true

This tells Splunk to go to the indexers to find the data for deduplication.

Path Finder

Is there any way to "clean" an existing summary index that contains duplicates?

0 Karma

Contributor

delete the whole range and then rerun the backfill is probably best unless you can create a custom script that finds all duplicate entries and cleans them. Might be easiest just to delete for the range (can do this by piping a search to the delete command) and rebuild

0 Karma

Motivator

option -nolocal true is taking lot to time to execute and its degrading the splunk performance on the server. is there any better way to achieve this. thanks

0 Karma

Path Finder

Me too... I tried this option "-no local true" .... and it did not do anything, after waiting hours.

0 Karma

Path Finder

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

View solution in original post

Path Finder

you mean this line from fill_summary_index.py , correct? dedupsearch = 'search splunk_server=local index=$index$ $namefield$="$name$" | stats count by $timefield$'

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!