Deployment Architecture

why is the cluster master not able to fixup buckets (generation tab) "cannot fix up search factor as bucket is not serviceable"

rphillips_splk
Splunk Employee
Splunk Employee

Problem:
My cluster master is reporting fixup tasks under the bucket status , > Generation tab with status "cannot fix up search factor as bucket is not serviceable", however these buckets are never getting fixed.

0 Karma
1 Solution

rphillips_splk
Splunk Employee
Splunk Employee

when the | delete command is issued in a search, data isn't actually deleted from disk but splunk creates a "deletes" directory and will not return those events in search.
ie:
on indexer:
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/db_1546932848_1546891149_15_4720CDA9-5F9B-4CE1-BB0D-10A6F555A1E4/rawdata/deletes
[root@indexer01 deletes]# zcat 38602ccf63e998fa1823f9f664055448.csv.gz
timestamp,event_address,type_id,host_id,source_id,sourcetype_id
1546932848,1846,0,2,1,1
1546932848,1844,0,2,1,1
1546932848,1842,0,2,1,1
1546932848,1840,0,2,1,1
1546932848,1838,0,2,1,1
1546932848,1836,0,2,1,1
1546932848,1834,0,2,1,1

first the primary bucket will now have the "deletes" directory

All peers which hold this bucket need to have the "deletes" directory in sync

The peer holding the primary bucket will update its checksum and update the cluster master

subsequently, the peer will initiate a sync request (peer to peer) to update the other peers holding this bucket and this sync happens over port 8089 between peers

If port 8089 is not open between indexers the sync request will fail between peers and you will have buckets in this state where they are in a fixup loop and never complete the fixup.

We see this in the CM fixup in the generation tab which shows "cannot fix up search factor as bucket is not serviceable"

if you see a log msg on the indexer in splunkd.log like the one below , most likely port 8089 (splunk mgmt default port) is not open between indexers and it needs to be:

01-08-2019 16:15:57.292 -0800 ERROR CMRepJob - job=CMSyncP2PJob bid= my_guid= my_rawport=9887 my_usessl=0 ot_guid= ot_hp=10.10.10.1:8089 ot_rawport=9887 ot_usessl=0 relative_path= custact=p2p_syncup getHttpReply failed; err: Connect Timeout

Once that port is opened the fixup tasks should complete and get remove from the CM fixup activities

View solution in original post

rphillips_splk
Splunk Employee
Splunk Employee

splunkd.log shows : ERROR CMRepJob - job=CMSyncP2PJob

0 Karma

jkat54
SplunkTrust
SplunkTrust

I’ve seen this before when frozen buckets were restored to just one of two indexers in their cluster.

Buckets in the thaweddb path are “not serviceable” because by placing them in thawed you’re telling splunk you don’t want them to be deleted. Splunk is also not going to replicate thawed buckets because that would be a mess. So then thawed buckets will also show as unserviceable.

I mention this because the solution for not serviceable thawed buckets would be different from the solution that worked above. In case someone comes with very similar issue but different situation.

rphillips_splk
Splunk Employee
Splunk Employee

when the | delete command is issued in a search, data isn't actually deleted from disk but splunk creates a "deletes" directory and will not return those events in search.
ie:
on indexer:
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/db_1546932848_1546891149_15_4720CDA9-5F9B-4CE1-BB0D-10A6F555A1E4/rawdata/deletes
[root@indexer01 deletes]# zcat 38602ccf63e998fa1823f9f664055448.csv.gz
timestamp,event_address,type_id,host_id,source_id,sourcetype_id
1546932848,1846,0,2,1,1
1546932848,1844,0,2,1,1
1546932848,1842,0,2,1,1
1546932848,1840,0,2,1,1
1546932848,1838,0,2,1,1
1546932848,1836,0,2,1,1
1546932848,1834,0,2,1,1

first the primary bucket will now have the "deletes" directory

All peers which hold this bucket need to have the "deletes" directory in sync

The peer holding the primary bucket will update its checksum and update the cluster master

subsequently, the peer will initiate a sync request (peer to peer) to update the other peers holding this bucket and this sync happens over port 8089 between peers

If port 8089 is not open between indexers the sync request will fail between peers and you will have buckets in this state where they are in a fixup loop and never complete the fixup.

We see this in the CM fixup in the generation tab which shows "cannot fix up search factor as bucket is not serviceable"

if you see a log msg on the indexer in splunkd.log like the one below , most likely port 8089 (splunk mgmt default port) is not open between indexers and it needs to be:

01-08-2019 16:15:57.292 -0800 ERROR CMRepJob - job=CMSyncP2PJob bid= my_guid= my_rawport=9887 my_usessl=0 ot_guid= ot_hp=10.10.10.1:8089 ot_rawport=9887 ot_usessl=0 relative_path= custact=p2p_syncup getHttpReply failed; err: Connect Timeout

Once that port is opened the fixup tasks should complete and get remove from the CM fixup activities

Masa
Splunk Employee
Splunk Employee

Our doc explains management port (default 8089) is the required port opened between cluster peers. We always needed this port opened.
https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Systemrequirements#Ports_that_the_cluste...

But, who reads doc all the time ? Wish Splunk checks connectivity of the required ports, and show warning message in Indexer Clustering page.

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

Wish Splunk checks connectivity of the required ports, and show warning message in Indexer Clustering page.

@Masa enhancement SPL-164805 has been filed 🙂

0 Karma

Masa
Splunk Employee
Splunk Employee

you're awesome, @rphillips_splunk

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...