Getting Data In

Splunk offline command and bucket replication

Super Champion

Reading through the "offline" documentation & "maintenance" mode documentation, I'm slighly confused, if we need to do both to do a long maintenance (about 8 hours) for an indexer. We don't want replication to happen as the disks are retained and the indexer contains around 20TB of data and it will take huge time if data is replicated.

Can you please confirm if my below understanding is true?

  1. splunk offline (without enforce): This will ensure all searches are serviced and gracefully make indexer peer offline. But if the "master" detects that the server is NOT up within restart_timeout period (60 secs), it will start bucket fixing tasks by replicating the copy to other indexers etc.
  2. splunk enable maintenance mode will ensure no replication happens. So enabling maintenance mode + shutdown of the indexer will ensure data will remain intact until system is all brought back correctly?

So I was planning to do Option2. Just wanted to check if my understanding is correct or anyone have better way/process to do indexer maintenance for long duration?

0 Karma
1 Solution

Legend

I agree, the best solution in your case is to (1) Set the cluster in maintenance mode. (2) Shutdown the indexer and perform maintenance. (4) Restart the indexer. (5) Remove the cluster from maintenance mode.

This will prevent the cluster from going into recovery mode during the maintenance window. Once the cluster exits maintenance mode, however, it will enter recovery mode to "catch up" on the replication that it forestalled during the maintenance window; this is normal and expected behavior. And you are right that this is much better than having the cluster trying to recover 20TB of data...

View solution in original post

Legend

I agree, the best solution in your case is to (1) Set the cluster in maintenance mode. (2) Shutdown the indexer and perform maintenance. (4) Restart the indexer. (5) Remove the cluster from maintenance mode.

This will prevent the cluster from going into recovery mode during the maintenance window. Once the cluster exits maintenance mode, however, it will enter recovery mode to "catch up" on the replication that it forestalled during the maintenance window; this is normal and expected behavior. And you are right that this is much better than having the cluster trying to recover 20TB of data...

View solution in original post

Super Champion

thank you

0 Karma