Getting Data In

Indexer Cluster Fixup Tasks Stuck (fsck failed: exitCode=24)(bucket is already registered)

azer271
Path Finder

After the Splunk Master enters maintenance mode, one of the indexers goes offline and then back online, and disables maintenance mode. The fixup tasks get stuck for about a week. The number of fixup tasks pending goes from around 5xx to 102 (after deleting rb bucket. I assume its the issue of bucket syncing in indexer cluster because client's server is a bit laggy(network delay, low cpu))

There are 40 fixup tasks in progress and 102 fixup tasks pending in the indexer cluster master.

The internal log shows that all those 40 tasks are displaying the following error:

Getting size on disk: Unable to get size on disk for bucket id=xxxxxxxxxxxxx path="/splunkdata/windows/db/rb_xxxxxx" (This is usually harmless as we may be racing with a rename in BucketMover or the S2SFileReceiver thread, or merge-buckets command which should be obvious in log file; the previous WARN message about this path can safely be ignored.) caller=serialize_SizeOnDisk

Delete dir exists, or failed to sync search files for bid=xxxxxxxxxxxxxxxxxxx; will build bucket locally. err= Failed to sync search files for bid=xxxxxxxxxxxxxxxxxxx from srcs=xxxxxxxxxxxxxxxxxxxxxxx

CMSlave [6205 CallbackRunnerThread] - searchState transition bid=xxxxxxxxxxxxxxxxxxxxx from=PendingSearchable to=Unsearchable reason='fsck failed: exitCode=24 (procId=1717942)'

Getting size on disk: Unable to get size on disk for bucket id=xxxxxxxxxxxxx path="/splunkdata/windows/db/rb_xxxxxx" (This is usually harmless as we may be racing with a rename in BucketMover or the S2SFileReceiver thread, or merge-buckets command which should be obvious in log file; the previous WARN message about this path can safely be ignored.) caller=serialize_SizeOnDisk

The internal log shows that all those 102 tasks are displaying the following error:

ERROR TcpInputProc [6291 ReplicationDataReceiverThread] - event=replicationData status=failed err="Could not open file for bid=windows~xxxxxx err="bucket is already registered with this peer" (Success)" 

Does anyone know what "fsck failed exit code 24" and "bucket is already registered with this peer" mean? How can these issues be resolved to reduce the number of fixup tasks? Thanks.

 

Labels (4)
0 Karma
1 Solution

azer271
Path Finder

An update to this old topic since it takes time to apply for performing a restart for my client. I fixed the issue by performing a bundle restart in the Splunk cluster master. I also increased the "max_peer_build_load" and "max_peer_rep_load" values in the server.conf file to clear up existing bucket fixup tasks more quickly. Still not sure what "fsck failed exit code 24" means tho. Probably just an issue of network delay or low cpu.

Reference: https://splunk.my.site.com/customer/s/article/Bucket-fixup-tasks-status-Missing-enough-suitable-cand...

 
 

 

 

View solution in original post

0 Karma

azer271
Path Finder

An update to this old topic since it takes time to apply for performing a restart for my client. I fixed the issue by performing a bundle restart in the Splunk cluster master. I also increased the "max_peer_build_load" and "max_peer_rep_load" values in the server.conf file to clear up existing bucket fixup tasks more quickly. Still not sure what "fsck failed exit code 24" means tho. Probably just an issue of network delay or low cpu.

Reference: https://splunk.my.site.com/customer/s/article/Bucket-fixup-tasks-status-Missing-enough-suitable-cand...

 
 

 

 

0 Karma

thahir
Communicator

@azer271 

"Bucket is already registered with the peer" means during bucket replication, that indexer peer attempted to replicate a bucket to another peer, but the target peer already has that bucket registered possibly as a primary or searchable copy. Therefore, it refuses to overwrite or duplicate it.

run the below rest command and check the health of the cluster

| rest /services/cluster/master/buckets | table title, bucket_flags, replication_count, search_count, status

and check for any standalone bucket issue, that also may be the reason

Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...