Deployment Architecture

OS Patching Indexer Cluster and bucket fixups


We're struggling to do OS patching of our indexer cluster in a reasonable timeframe. It currently takes about 24 hours with the vast majority of that time just waiting on bucket fixups tasks to complete between reboots. Wondering how others are doing it without impacting any searching or filling up index process queues. Our current process:

  1. Blast out an apt update && apt upgrade -y && apt autoremove -y to all indexes. Takes about 10 - 15 minutes to complete
  2. Blast out a puppet no-noop to all indexers - takes about 5 minutes to complete
  3. The for each indexer:
  4. splunk offline - takes 5 - 10 minutes
  5. reboot - takes < 30 seconds
  6. Wait for bucket fix ups to complete - around 30 minutes

We've had issues using the rolling restart - it sometimes gets stuck in the middle and you have to bounce the cluster master and it doesn't resume where it left off. It also by default defers saved searches, which effectively disables alerting in our environment for a few hours (we are enabling running saved searches during rolling restarts to address this). Does this just work out of the box for others or are there secret "gold" settings that you've had to tweak?

Some information about our environment:

  • 2 sites
  • 24 indexers per site
  • Splunk 7.3.3
  • buckets only replicate between sites, no intra-site replication factor. Wondering if this is contributing to our problems... we've been looking into increasing storage to account for this.
  • 5ms between sites, multi-10gbit links
Labels (2)
0 Karma

Splunk Employee
Splunk Employee

this shouldve been something we fixed/alleviated in 7.3.2. most likely these were unclean bucket replications - where a StreamingError happened on a hot bucket target, which made the indexers throw one copy away, and triggered a fixup to re-replicate the missing bucket...

one thing to try to pinpoint this is to pick any bucket that needed fixups, and see what happened to that bucket throughout the process.

0 Karma


We did a rolling reboot last night to change some indexes and splunk aggressively rolled through and bounced all 48 in about an hour, leaving about 50k fixups on the queue.
So that doesn't seem to be our answer....

0 Karma
Get Updates on the Splunk Community!

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...