when the | delete command is issued in a search, data isn't actually deleted from disk but splunk creates a "deletes" directory and will not return those events in search.
[root@indexer01 deletes]# zcat 38602ccf63e998fa1823f9f664055448.csv.gz
first the primary bucket will now have the "deletes" directory
All peers which hold this bucket need to have the "deletes" directory in sync
The peer holding the primary bucket will update its checksum and update the cluster master
subsequently, the peer will initiate a sync request (peer to peer) to update the other peers holding this bucket and this sync happens over port 8089 between peers
If port 8089 is not open between indexers the sync request will fail between peers and you will have buckets in this state where they are in a fixup loop and never complete the fixup.
We see this in the CM fixup in the generation tab which shows "cannot fix up search factor as bucket is not serviceable"
if you see a log msg on the indexer in splunkd.log like the one below , most likely port 8089 (splunk mgmt default port) is not open between indexers and it needs to be:
01-08-2019 16:15:57.292 -0800 ERROR CMRepJob - job=CMSyncP2PJob bid= myguid= myrawport=9887 myusessl=0 otguid= othp=10.10.10.1:8089 otrawport=9887 otusessl=0 relativepath= custact=p2p_syncup getHttpReply failed; err: Connect Timeout
Once that port is opened the fixup tasks should complete and get remove from the CM fixup activities
Our doc explains management port (default 8089) is the required port opened between cluster peers. We always needed this port opened.
But, who reads doc all the time ? Wish Splunk checks connectivity of the required ports, and show warning message in Indexer Clustering page.
Wish Splunk checks connectivity of the required ports, and show warning message in Indexer Clustering page.
@Masa enhancement SPL-164805 has been filed 🙂
you're awesome, @rphillips_splunk
I’ve seen this before when frozen buckets were restored to just one of two indexers in their cluster.
Buckets in the thaweddb path are “not serviceable” because by placing them in thawed you’re telling splunk you don’t want them to be deleted. Splunk is also not going to replicate thawed buckets because that would be a mess. So then thawed buckets will also show as unserviceable.
I mention this because the solution for not serviceable thawed buckets would be different from the solution that worked above. In case someone comes with very similar issue but different situation.
splunkd.log shows : ERROR CMRepJob - job=CMSyncP2PJob