Deployment Architecture

Index Cluster rolling-restart problem!

simony
Path Finder

Hi all,

We have in our productive splunk architecture a very unpleasant problem.

The rolling-restart behaves not as he should. Be it creating an index or otherwise. In a rolling-restart, every indexer in the index cluster started new without waiting until an indexer again is up. This means that our index cluster each time is completely down! Why splunk not wait and restarts one indexer after another? What is the time splunk waits until he restarts another indexer?
After a restart every indexer have about one hour till he is index ready and searchable. This why every indexer has one hour to work off his buckets. For this reason we are each time a whole hour offline.

Details about our environment:

  • 8 node index cluster (2 sites)
  • replication factor is 3
  • search factor is 2
  • indexes 143
  • all configuration is held on the master and pushed to the peers
  • search head cluster consisting of 7 members

I'm grateful for any help.

Regards,
Yanick

0 Karma
1 Solution

simony
Path Finder

Splunk Version 6.5.2 solves the Problem.

View solution in original post

0 Karma

gavsdavs_GR
Path Finder

This parameter in server.conf is a timer.

restart_timeout = <positive integer>
* Only valid for mode=master
* This is the amount of time the master waits for a peer to come
  back when the peer is restarted (to avoid the overhead of
  trying to fixup the buckets that were on the peer).
* Note that this only works with the offline command or if the peer
  is restarted vi the UI.
* Defaults to 60s.

The CM calls the restart of a peer, and waits for this duration for the indexer to check back in again.
If your indexers take longer that duration to check back in, the CM calls the next peer restart anyway and you end up with mutliple indexers restarting at the same time.
We have this set to about 30 minutes because our indexers take ages to restart.

Hopefully a splunk person can tell me that behaviour is now fixed.

0 Karma

simony
Path Finder

Splunk Version 6.5.2 solves the Problem.

0 Karma

hardikJsheth
Motivator

Can you check value of percent_peers_to_restart in Server.conf file of your Indexer Master?

By default it's 10. If the value is different, reset it using following command:

$SPLUNK_HOME/splunk edit cluster-config -percent_peers_to_restart 10

0 Karma

simony
Path Finder

We have configured the default value (10) for percent_peers_to_restart in server.conf.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...