Knowledge Management

fill_summary_index dedup issue

SarahBOA
Path Finder

We are trying to use the fill_summary_index.py script to backfill times when the data isn't populated. I am finding that the -dedup t option does not work and I am getting duplicate data in my summary index.

My command: ./splunk cmd python fill_summary_index.py -app ecomm_splunk_administration -dedup t -name sumidx_webserver_count_1minute -et -11m@m -lt -4m@m -owner summaryadmin -index webserver_summary_fivemin -auth nbkgild:mypassword

When I look at the output, I see the following:
*** For saved search 'sumidx_webserver_count_1minute' ***
Executing search to find existing data: 'search splunk_server=local index=webserver_summary_fivemin source="sumidx_webserver_count_1minute" | stats count by search_now'
waiting for job sid = '1358203822.191' ... finished
All scheduled times will be executed.

*** Spawning a total of 6 searches (max 1 concurrent) ***

The issue I see, is the search splunk_server=local. My splunk environment is a distributed server environment and therefore, my summary index is not on the local search head. How can I stop it from searching only the local server? and instead use the servers in the distributedsearch.conf file?

Thanks,
Sarah

1 Solution

SarahBOA
Path Finder

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

View solution in original post

the_wolverine
Champion

If you are using a SH to run the backfill script and your summary indexed data resides on indexers, you will want to use an undocumented (in the help file) option called -nolocal .

./splunk cmd python fill_summary_index.py -dedup true -nolocal true

This tells Splunk to go to the indexers to find the data for deduplication.

wsnyder2
Path Finder

Is there any way to "clean" an existing summary index that contains duplicates?

0 Karma

briancronrath
Contributor

delete the whole range and then rerun the backfill is probably best unless you can create a custom script that finds all duplicate entries and cleans them. Might be easiest just to delete for the range (can do this by piping a search to the delete command) and rebuild

0 Karma

rakesh_498115
Motivator

option -nolocal true is taking lot to time to execute and its degrading the splunk performance on the server. is there any better way to achieve this. thanks

0 Karma

wsnyder2
Path Finder

Me too... I tried this option "-no local true" .... and it did not do anything, after waiting hours.

0 Karma

SarahBOA
Path Finder

To solve the issue, I editted the fill_smmary_index.py file and removed the splunk_server=local from the dedup_search variable. That seems to have solved it and I am no longer getting duplicate records put into my index.

wsnyder2
Path Finder

you mean this line from fill_summary_index.py , correct? dedupsearch = 'search splunk_server=local index=$index$ $namefield$="$name$" | stats count by $timefield$'

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...