We are trying to upgrade couple of indexers from our multi site cluster to a better hardware (16 core to 24 core etc). We decided to simply swap the disk to the new boxes to avoid unnecessary fix-up activities and save network traffic.
What is the best way to perform this upgrade?
I am thinking -
Or am I supposed to use Splunk Offline mode by extending the default interval?
I would follow the cluster upgrade procedure (minus the upgrade tasks for cluster master and search head) to do this. The only addition from your list would to do run "splunk offline" on indexers/peer nodes before stopping them.
As per https://answers.splunk.com/answers/464439/what-is-the-best-action-plan-during-hardwarefirmwa.html,
We don't even need to enable the maintenance mode? I am trying to avoid failed searches during this upgrade process.
Yes, the maintenance mode enable is not a requirement to upgrade the peers, but not enabling maintenance mode has certain effect on the cluster health (too many bucket rolling may occur). For short duration to which the peers will be down, I would enable the maintenance mode. See this for more information on effect of not enabling maintenance mode.
splunk offline actually stops the indexer.
How long do you think the process will take for each indexer?
Before you put the cluster in maintenance mode, you might consider increasing the restart timeout value to some number of seconds longer that the process will take:
splunk edit cluster-config -restart_timeout 900
Also be sure to take the cluster out of maintenance mode once you are done with the process.
""After the peer shuts down, you have 60 seconds (by default) to complete any maintenance work and bring the peer back online. If the peer does not return to the cluster within this time, the master initiates bucket-fixing activities to return the cluster to a complete state. If you need more time, you can extend the time that the master waits for the peer to come back online by configuring the restart_timeout attribute""
But why does "restart_timeout" matter here ? when you are already putting cluster into maintenance mode which does not allow any bucket fixup activity.