Getting Data In

Slow indexer restart in cluster

splunkettes
Path Finder

When restarting an indexer in our cluster, I first put the cluster in maintenance mode. The indexer restarts within minutes but the cluster manager shows it as "Starting" for about 30minutes. The logs show many event=addBucket. Why does it have to do so many addBuckets when the cluster was in maintenance mode. 

Is anyone else having this issue? I'm wondering if there is a server.conf setting that will fix this so that the only addBuckets that need to occur are those where the indexer received data while in maintenance mode. I feel like the indexer shouldn't have to addbuckets for buckets that were already added before the restart..

Labels (1)
0 Karma

inventsekar
SplunkTrust
SplunkTrust

For learners reference, i thought to post the Splunk doc link, thanks. 

https://help.splunk.com/en/data-management/manage-splunk-enterprise-indexers/9.1/manage-the-indexer-...

 

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @splunkettes 

It’s expected to see the addBucket events. Each addBucket event represents the peer telling the CM "I have this bucket." The CM then reconciles this against its generation metadata. The more buckets on that indexer, the longer this takes.

If one peer/indexer is taking longer than others to restart then it could be caused by a number of reasons; are there more buckets on this indexer? You can check this on the Cluster Manager a rebalance if required. 
Another reason could be that there is an input stream which doesn’t close quickly. Is it the same indexer each time?

 

 

🌟 Did this answer help you? If so, please consider:

    • Adding karma to show it was useful
    • Marking it as the solution if it resolved your issue
    • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing.

splunkettes
Path Finder

The behavior is the same across all indexers in the cluster. Add Buckets seems to be the long pole in the tent. If it is normal then we can deal with it. Also looking at trying to clean up data make sure all indexes are receiving events with proper timestamps, .etc. 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Probably only thing that you can try is to get more IOPS for your indexers disks.
I have seen cases where this phase has taken even hours due to too slow disks and lot of cold buckets.
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...