Hi @lmvmandadi ,
The answer to your question of whether it's safe to delete is, "it depends". If you're in a SHC environment you might be able to remove items from that folder without some major impacts. However, given the questions and the responses in the thread you've posted, I would probably look at re-building that particular search head.
1. Stop the problem search head
2. Back up the /opt/splunk (or whatever $SPLUNK_HOME is associated to) folder
3. Move or delete the /opt/splunk folder
4. Install a clean copy of Splunk the same version as your other members
5. Add the clean install member to the SHC and let the SHC captain re-sync all the files
The downside to the above is that you will lose any changes that you made on that particular server (not lose permanently because we backed up all the config files, but the changes will need to be re-done on the clean copy).
I think this might be a better solution, given that you'll probably spend more time troubleshooting SHC issues, trying to resolve them and potentially introduce other problems into the SHC.
Hope this helps.
Most of the data should be under
-- In the dispatch directory, a search-specific directory is created for each search or alert. Each search-specific directory contains several files including a CSV file of the search results, a search.log file with details about the search execution, and more. These are 0-byte files.
You can read about at Dispatch directory and search artifacts
The bottom of the page speaks about -
-- Clean up the dispatch directory based on the age of directories
From two days I am having search head clustering issuses " Search head cluster member (https://hesplsrhc003:8089) is having problems pushing configurations to the search head cluster captain .Changes on this member are not replicating to other members.
I tried to change the captain , done a rolling restart , ran the resync command but still have issues
Yes.I checked the logging .I see the below error and the connection between the members are fine
07-12-2019 14:50:57.458 -0400 ERROR ConfReplicationThread - Error pushing configurations to captain=https://hesplsrhc004:8089, consecutiveErrors=2333 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=3dfc93bbf15bcbb2d0c2c8b69d542d7d05181bb2; current_baseline_op_id=5d0509452c20f0c738813010a053ae57e4aefb64": Search head clustering: Search head cluster member (https://hesplsrhc002:8089) is having problems pushing configurations to the search head cluster captain (https://hesplsrhc0048089). Changes on this member are not replicating to other members.
Got ya. So that message is much more informative. Your SHC members need to inform the captain of changes they make, so he replicates them to the remaining ones. The problem is what your member is pushing is too far back compared to what the captain has. So you need to ensure there is a common baseline in all of them, meaning you need to resync them.
I'd start by
splunk show shcluster-status and check the last_replication_conf in the one not the captain and compare to the captain. A manual resync of the members should then be done so they share a common commit.
Thank you for your mail .I have the manual resync by running "splunk resync shcluster-replicated-config" but nothing has changed .I have ran this command from two days but no use
The last replication for all the members is last_conf_replication : Fri Jul 12 17:11:48 2019 which I think not an issue
I got the following error for one of the member
Downloaded an old snapshot created 91696 seconds ago; Check for clock skew on this member or the captain; If no clock skew is found, check the captain for possible snapshot creation failures
There's a parameter controlling when the changes are erased in server.conf: conf_replication_purge.eligibile_age. Its default is one day (86400 secs).
What do you mean "I have ran this command from two days" ?
I mean the manual resync command I have ran it two days ago and yesterday also ,but I still see the error .
Coming to the old snapshot thing.What changes can I make in order to make the resync with the latest time
You need to run that command in the member that had the issue, and you'd have "The member has been synced to the latest replicated configurations on the captain."
Is that what you've done? run the resync command on the member with the issue?
Yes I have done on the search head that had the issue .Once it showed it synced with the latest replication and some times it shows the clockskew error .I raised a ticket with splunk support