Getting Data In

Indexer Cluster Fixup Tasks Stuck (fsck failed: exitCode=24)(bucket is already registered)

azer271
Path Finder

After the Splunk Master enters maintenance mode, one of the indexers goes offline and then back online, and disables maintenance mode. The fixup tasks get stuck for about a week. The number of fixup tasks pending goes from around 5xx to 102 (after deleting rb bucket. I assume its the issue of bucket syncing in indexer cluster because client's server is a bit laggy(network delay, low cpu))

There are 40 fixup tasks in progress and 102 fixup tasks pending in the indexer cluster master.

The internal log shows that all those 40 tasks are displaying the following error:

Getting size on disk: Unable to get size on disk for bucket id=xxxxxxxxxxxxx path="/splunkdata/windows/db/rb_xxxxxx" (This is usually harmless as we may be racing with a rename in BucketMover or the S2SFileReceiver thread, or merge-buckets command which should be obvious in log file; the previous WARN message about this path can safely be ignored.) caller=serialize_SizeOnDisk

Delete dir exists, or failed to sync search files for bid=xxxxxxxxxxxxxxxxxxx; will build bucket locally. err= Failed to sync search files for bid=xxxxxxxxxxxxxxxxxxx from srcs=xxxxxxxxxxxxxxxxxxxxxxx

CMSlave [6205 CallbackRunnerThread] - searchState transition bid=xxxxxxxxxxxxxxxxxxxxx from=PendingSearchable to=Unsearchable reason='fsck failed: exitCode=24 (procId=1717942)'

Getting size on disk: Unable to get size on disk for bucket id=xxxxxxxxxxxxx path="/splunkdata/windows/db/rb_xxxxxx" (This is usually harmless as we may be racing with a rename in BucketMover or the S2SFileReceiver thread, or merge-buckets command which should be obvious in log file; the previous WARN message about this path can safely be ignored.) caller=serialize_SizeOnDisk

The internal log shows that all those 102 tasks are displaying the following error:

ERROR TcpInputProc [6291 ReplicationDataReceiverThread] - event=replicationData status=failed err="Could not open file for bid=windows~xxxxxx err="bucket is already registered with this peer" (Success)" 

Does anyone know what "fsck failed exit code 24" and "bucket is already registered with this peer" mean? How can these issues be resolved to reduce the number of fixup tasks? Thanks.

 

Labels (4)
0 Karma
1 Solution

azer271
Path Finder

An update to this old topic since it takes time to apply for performing a restart for my client. I fixed the issue by performing a bundle restart in the Splunk cluster master. I also increased the "max_peer_build_load" and "max_peer_rep_load" values in the server.conf file to clear up existing bucket fixup tasks more quickly. Still not sure what "fsck failed exit code 24" means tho. Probably just an issue of network delay or low cpu.

Reference: https://splunk.my.site.com/customer/s/article/Bucket-fixup-tasks-status-Missing-enough-suitable-cand...

 
 

 

 

View solution in original post

0 Karma

azer271
Path Finder

An update to this old topic since it takes time to apply for performing a restart for my client. I fixed the issue by performing a bundle restart in the Splunk cluster master. I also increased the "max_peer_build_load" and "max_peer_rep_load" values in the server.conf file to clear up existing bucket fixup tasks more quickly. Still not sure what "fsck failed exit code 24" means tho. Probably just an issue of network delay or low cpu.

Reference: https://splunk.my.site.com/customer/s/article/Bucket-fixup-tasks-status-Missing-enough-suitable-cand...

 
 

 

 

0 Karma

thahir
Communicator

@azer271 

"Bucket is already registered with the peer" means during bucket replication, that indexer peer attempted to replicate a bucket to another peer, but the target peer already has that bucket registered possibly as a primary or searchable copy. Therefore, it refuses to overwrite or duplicate it.

run the below rest command and check the health of the cluster

| rest /services/cluster/master/buckets | table title, bucket_flags, replication_count, search_count, status

and check for any standalone bucket issue, that also may be the reason

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...