- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index.
My command: ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword
When I look at the output, I see the following:
*** For saved search 'sumidx_webserver_count_1minute' ***
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'
waiting for job sid = '1358203822.191' ... finished
All scheduled times will be executed.
*** Spawning a total of 6 searches (max 1 concurrent) ***
The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?
Thanks,
Sarah
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you are using a SH to run the backfill script and your summary indexed data resides on indexers, you will want to use an undocumented (in the help file) option called -nolocal .
./splunk cmd python fill_summary_index.py -dedup true -nolocal true
This tells Splunk to go to the indexers to find the data for deduplication.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Is there any way to "clean" an existing summary index that contains duplicates?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

delete the whole range and then rerun the backfill is probably best unless you can create a custom script that finds all duplicate entries and cleans them. Might be easiest just to delete for the range (can do this by piping a search to the delete command) and rebuild
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
option -nolocal true is taking lot to time to execute and its degrading the splunk performance on the server. is there any better way to achieve this. thanks
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Me too... I tried this option "-no local true" .... and it did not do anything, after waiting hours.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

you mean this line from fill_summary_index.py , correct? dedupsearch = 'search splunk_server=local index=$index$ $namefield$="$name$" | stats count by $timefield$'
