As part of the destructive resync that I performed on the 2 members that were out of sync, I saw the below messages on the SH’s after process completion.
They have downloaded a snapshot from the captain that is 5 days old.
Does this mean that the Captain does not have a common that is recent than 5 days.
--- resync and results --
$ splunk resync shcluster-replicated-config
Your session is invalid. Please login.
Splunk username: admin
Password:
Downloaded an old snapshot created 485324 seconds ago; Check for clock skew on this member or the captain; If no clock skew is found, check the captain for possible snapshot creation failures*
I found error messages repeating as below, which suggests it has been failing for days.
09-18-2019 18:35:58.803 +0000 ERROR ConfReplication - Error creating snapshot: /opt/splunk/var/run/splunk/snapshot/15831677-5b6c4f95a711c6431341ba397e4c6b012a.bundle.f3effb6944a1e.tmp; Configurations changed while generating snapshot, original_latest_change=5b6c4f95a711c6431341ba397e4c6b012a, new_latest_change=2f2baeb33f5867261227d7636d5c7ed3b0d38749; consecutiveRejectionFromNewChanges=336;* Check conf.log to see if any app or client is making frequent configuration changes; Continuous snapshot creation failures can lead to configuration replication issues if this member becomes the captain*
As it suggests in the message above the conf.log shows a lot of changes "addCommit" from ES import, due to this it updates local.meta and interrupts the creation of snapshot.
== Use the below searches to identify the changes that interrupts the operation ==
index=_internal source=*/splunkd.log consecutiveRejectionFromNewChanges earliest=-1d latest=now
Index=_internal source=/conf.log source=*/conf.log* data.task=addCommit| timechart span=5m count by data.optype_desc
Especially this issue was caused by the ES import modular input which updates several 100s of apps and add-ons installed on the SH. The import operation is only needed when new apps/add-ons installed on the server, without it ES will not recognize the data to be monitored.
This has been worked around by increasing the interval to, like 2hrs, for ES import mod input, which is in inputs.conf of /etc/apps/SplunkEnterpriseSecuritySuite, this import has been removed in the latest version of ESS 5.3.1.
It depends on the deployments environment - this time it was caused by ES import but there could be some other apps/add-on which could frequently update the configs.
I found error messages repeating as below, which suggests it has been failing for days.
09-18-2019 18:35:58.803 +0000 ERROR ConfReplication - Error creating snapshot: /opt/splunk/var/run/splunk/snapshot/15831677-5b6c4f95a711c6431341ba397e4c6b012a.bundle.f3effb6944a1e.tmp; Configurations changed while generating snapshot, original_latest_change=5b6c4f95a711c6431341ba397e4c6b012a, new_latest_change=2f2baeb33f5867261227d7636d5c7ed3b0d38749; consecutiveRejectionFromNewChanges=336;* Check conf.log to see if any app or client is making frequent configuration changes; Continuous snapshot creation failures can lead to configuration replication issues if this member becomes the captain*
As it suggests in the message above the conf.log shows a lot of changes "addCommit" from ES import, due to this it updates local.meta and interrupts the creation of snapshot.
== Use the below searches to identify the changes that interrupts the operation ==
index=_internal source=*/splunkd.log consecutiveRejectionFromNewChanges earliest=-1d latest=now
Index=_internal source=/conf.log source=*/conf.log* data.task=addCommit| timechart span=5m count by data.optype_desc
Especially this issue was caused by the ES import modular input which updates several 100s of apps and add-ons installed on the SH. The import operation is only needed when new apps/add-ons installed on the server, without it ES will not recognize the data to be monitored.
This has been worked around by increasing the interval to, like 2hrs, for ES import mod input, which is in inputs.conf of /etc/apps/SplunkEnterpriseSecuritySuite, this import has been removed in the latest version of ESS 5.3.1.
It depends on the deployments environment - this time it was caused by ES import but there could be some other apps/add-on which could frequently update the configs.