We have 3 Node SHC pool and the SHC is still frequently gets out-of-synch and keeps throwing the following UI banner message: "Error pulling configurations from the search head cluster captain; consider performing a destructive configuration resync on this search head cluster member."
These are the recommended setting changes implemented:
scheduling_heuristic = round_robin
captain_is_adhoc_searchhead = true
replication_factor = 1
12-14-2015 17:22:54.072 +0000 WARN ConfReplication - installed_snapshot="/ngs/app/splunkt/SHC/splunk/var/run/splunk/snapshot/1450111567-b0d62539eea238d3c00ccbe9f81601fd6675f5d9.bundle" has earlier timestamp than existing snapshot="/ngs/app/splunkt/SHC/splunk/var/run/splunk/snapshot/1450113494-f4754dfa40753dbce4014552b9f64dbc6c00844d.bundle"; check for clock skew
What does this error message mean? Could this be the cause of the issue?
Another good question to check, are the clocks sync'd on all the members? If you have time drift, or the clocks are different, this always going to happen. Make sure your times are sycnd across all SHC members.
This WARN seems to be coming from when user is performing a destructive resync.
Most likely he timestamp for the installed_snapshot is coming from the captain's latest tarball. The timestamp for the "existing snapshot" is coming from the local member.
The message basically means that the latest snapshot from the captain has an earlier timestamp than the latest snapshot on the member, and hence destructive resync is "going back in time".