What issues are you facing? Have you checked the internals for shc deployer pushing errors first? Deleting the search artifacts residing in /var/run/ won't necessarily help
From two days I am having search head clustering issuses " Search head cluster member (https://hesplsrhc003:8089) is having problems pushing configurations to the search head cluster captain .Changes on this member are not replicating to other members.
I tried to change the captain , done a rolling restart , ran the resync command but still have issues
You can enable more.agressive logging for the shc components and see what it says. Is your network connection between members ok? Can you check it?
Yes.I checked the logging .I see the below error and the connection between the members are fine
07-12-2019 14:50:57.458 -0400 ERROR ConfReplicationThread - Error pushing configurations to captain=https://hesplsrhc004:8089, consecutiveErrors=2333 msg="Error in acceptPush: Non-200 statuscode=400: ConfReplicationException: Cannot accept push with outdatedbaselineopid=3dfc93bbf15bcbb2d0c2c8b69d542d7d05181bb2; currentbaselineop_id=5d0509452c20f0c738813010a053ae57e4aefb64": Search head clustering: Search head cluster member (https://hesplsrhc002:8089) is having problems pushing configurations to the search head cluster captain (https://hesplsrhc0048089). Changes on this member are not replicating to other members.
Got ya. So that message is much more informative. Your SHC members need to inform the captain of changes they make, so he replicates them to the remaining ones. The problem is what your member is pushing is too far back compared to what the captain has. So you need to ensure there is a common baseline in all of them, meaning you need to resync them.
I'd start by
splunk show shcluster-status and check the lastreplicationconf in the one not the captain and compare to the captain. A manual resync of the members should then be done so they share a common commit.
Thank you for your mail .I have the manual resync by running "splunk resync shcluster-replicated-config" but nothing has changed .I have ran this command from two days but no use
The last replication for all the members is lastconfreplication : Fri Jul 12 17:11:48 2019 which I think not an issue
I got the following error for one of the member
Downloaded an old snapshot created 91696 seconds ago; Check for clock skew on this member or the captain; If no clock skew is found, check the captain for possible snapshot creation failures
There's a parameter controlling when the changes are erased in server.conf: confreplicationpurge.eligibile_age. Its default is one day (86400 secs).
What do you mean "I have ran this command from two days" ?
I mean the manual resync command I have ran it two days ago and yesterday also ,but I still see the error .
Coming to the old snapshot thing.What changes can I make in order to make the resync with the latest time