We've been using stand alone indexers for three years and only in the last few week do we have a multisite indexer clustering turned on. Splunk has done a great job since 5.2.x where making many changes to props and transforms don't require a restart to take effect. By 6x you could update several index-time activities such as line breaking, adjusting for timestamps, etc and the indexers would simply make the changes live; no restart. We are finding that much of the capability to make those sorts of changes without the indexers restarting are now gone since moving to clustering. While a rolling restart doesn't take overly long, the bucket fixup process can take hours. This not only impacts search performance, but summary index searches capture point in time events and can be locked out of the index they are searching for extended periods while buckets are being fixed (just opened a ticket last night).
Is there a list of props/transforms changes that cause clustered indexers to restart?
Reload occurs when:
You add a new sourcetype in props.conf.
You add or update any TRANSFORMS- stanzas in props.conf.
You add any new stanzas in transforms.conf.
You make any of these changes in indexes.conf:
Adding new index stanzas
Enabling or disabling an index with no data
Changing any attributes not listed as requiring restart
Restart occurs when:
The configuration bundle contains changes to any configuration files besides indexes.conf, props.conf, or transforms.conf.
You make any changes to props.conf or transforms.conf, other than those specified in the reload list, above.
You make any of these indexes.conf changes:
Adding or removing a volume
Enabling or disabling an index with data
Removing an index
Changing any of these attributes: homePath, coldPath, thawedPath, bloomHomePath, summaryHomePath, tstatsHomePath, repFactor, rawChunkSizeBytes, minRawFileSyncSecs, syncMeta, maxConcurrentOptimizes, coldToFrozenDir
In our case adding a new props sourcetype which included line breaking should trigger a reload and not a restart if I read the list correctly. What happened though was a restart. In a stand alone environment adding a new props DOES result in a reload.
A new index never triggers a restart. As a rule of thumb you can say - everything possible to configure in the UI does most of the time not trigger a restart (except a few exceptions).
The best way you can help us when you notice sometimes triggers a restart which normally reloads OR when a change requires restart in cluster vs standalone, could you capture the diags and open a support case ? We can look into it and address any issues.
Thanks for the comment Mahamed; I've done just that =). I've also come up with the following query.
index=_internal component=loader OR component=cmslave OR component=bundlejob OR component=clusterslaveconfigreloader NOT "the following confs" reload OR restart OR "Splunkd starting" NOT "arguments are:"
What is interesting to me is a chain of events like the following; note the ClusterSlaveConfigReloader and BundleJob components not agreeing on a reload being sufficient.
09-14-2015 15:52:55.955 -0400 INFO loader - Splunkd starting (build 272645).
09-14-2015 15:50:59.667 -0400 INFO CMSlave - shutdown initiated restart=1
09-14-2015 15:50:54.319 -0400 INFO CMSlave - detected restart is required, will restart node
09-14-2015 15:50:54.259 -0400 INFO CMSlave - master has instructed peer to restart
09-14-2015 15:45:50.261 -0400 INFO BundleJob - Informed bundle reload status to the master for bundle id=90011B3B632530B6678D6DCD6DFF6FBB
09-14-2015 15:45:50.114 -0400 INFO BundleJob - Restart required to reload bundle id = 90011B3B632530B6678D6DCD6DFF6FBB.
09-14-2015 15:45:47.864 -0400 INFO ClusterSlaveConfigReloader - Restart not required for conf=props, even though some properties were changed in the downloaded conf.
09-14-2015 15:45:41.394 -0400 INFO CMSlave - Just queued the reload job for bundleid=90011B3B632530B6678D6DCD6DFF6FBB