Deployment Architecture

What is the best action plan during hardware/firmware maintenance of a Splunk Linux server?

Builder

We have 8 Splunk indexers in our environment (2 sites).
One indexer server needs to be serviced: update the BIOS, RAID controller firmware and iLO firmware.
What's the best business practice in these kind of cases?

  1. Do we need to enable maintenance mode by running "splunk enable maintenance-mode"?
  2. Do we need to take the peer offline by running "splunk offline"?
  3. Do we need to disable start of Splunk on reboot temporary by running splunk disable boot-start in case this service task requires numerous reboots during the maintenance?
  4. Any other advice?

Thank you in advance

0 Karma
1 Solution

Motivator

Hey @mlevsh

8 Indexers (2 sites) i am assuming multi site indexer cluster?

If you are not modifying/altering the application (Example: Splunk upgrade etc) you could simple use the offline command and yes it's safe to temporarily disable boot start if the patching requires multiple reboots.

Do you have a source to backup data data from that indexer? Not that you require to backup data....always safe to have a copy while working on these machines. Since it's a multisite cluster...hoping data is always a mimic on the second site...check for data size .

But yeah, offline command and temporarily disabling boot start should be sufficient.

Thanks,
Raghav

View solution in original post

Motivator

Hey @mlevsh

8 Indexers (2 sites) i am assuming multi site indexer cluster?

If you are not modifying/altering the application (Example: Splunk upgrade etc) you could simple use the offline command and yes it's safe to temporarily disable boot start if the patching requires multiple reboots.

Do you have a source to backup data data from that indexer? Not that you require to backup data....always safe to have a copy while working on these machines. Since it's a multisite cluster...hoping data is always a mimic on the second site...check for data size .

But yeah, offline command and temporarily disabling boot start should be sufficient.

Thanks,
Raghav

View solution in original post

Builder

@Raghav2384 , thank you!

One verification. Do we need to extend the restart period by running "splunk edit cluster-config -restart_timeout " on cluster master? Lets say we can put 7200 seconds for 2 hours

Also, we don't really know how long the maintenance (splunk edit cluster-config -restart_timeout ) will take. How should this situation be handled?

0 Karma

Motivator

I do not think that is required. Since you are working on only one out of 8 indexers, I would just put that indexer in offline mode and disable boot start. We applied OS patches on our 33 indexer cluster by taking one indexer offline at the time.

IT did take lot of time but I never had to touch my master ot the configurations on the master once during the process. As long as the work you are about to do doesn't alter the application or configs, you should be alright.

I apologize for the delay.

Thanks,
Raghav

0 Karma

Builder

Raghav,
thank you again for your reply!

I'm a little concerned though. Our system has "restart_timeout" value set to 60 seconds. Last time our unix SAs applied the update , it took them about 3 hrs as some issues occurred. Never ran splunk offline before. If it starts to bring peer online after "restart_temeout" value which is 60 sec and we need 3 hrs, what would be an impact on our system?

0 Karma

Contributor

@mlevsh - What did you finally end-up doing?

0 Karma

New Member

After the peer shuts down, you have 60 seconds (by default) to complete any maintenance work and bring the peer back online. If the peer does not return to the cluster within this time, the master initiates bucket-fixing activities to return the cluster to a complete state.

It does not start bringing peers online ...

Also, when maintenance mode is enabled, restart_timeout doesn't matter since maintenance mode avoid any bucket fixup activity.

0 Karma