Re: High number of Indexer cluster fixup tasks

critchos · ‎11-21-2023

Hi.

We have an indexer cluster of 4 nodes with a little over 100 hundred indexes. We've recently taken a look and the cluster manager fixup tasks and noticed a large number of fixup tasks pending over 100 days (24000) for a select few of the indexes.

The majority of these tasks are for the following reasons.

Received shutdown notification from peer and Cannot replicate as bucket hasn't rolled yet.

For some reason these few indexes are quite low volume but have a large number of buckets. ideally i would like to clear these tasks.

If we aren't precious about the data would a suitable solution be to remove the indexes from the cluster configuration, manually delete the data folders for the indexes and re enable the indexes?

Or could we reduce the data size on the index/number of buckets on the index to clear out these tasks?

example of one of the index configurations

# staging: 0.01 GB/day, 91 days hot, 304 days cold
[staging]
homePath = /splunkhot/staging/db
coldPath = /splunkcold/staging/colddb
thawedPath = /splunkcold/staging/thaweddb
maxDataSize = 200
frozenTimePeriodInSecs = 34128000
maxHotBuckets = 1
maxWarmDBCount = 300
homePath.maxDataSizeMB = 400
coldPath.maxDataSizeMB = 1000
maxTotalDataSizeMB = 1400

Thanks for any advice.

Alex_LC · ‎04-22-2024

Hi,

I was looking for an answer to the same problem, and I came across this older post which kind of confirmed my understanding of the issue and the available solutions:

https://community.splunk.com/t5/Deployment-Architecture/Why-is-cluster-master-reporting-quot-Cannot-...

Short summary: hot buckets are streamed from the originating indexer to the other indexers in the cluster, but sometimes they get out of sync for various reasons and the CM starts displaying this type of errors. Two ways to fix it: either roll the buckets (via the GUI on the CM, the API endpoint or by performing a rolling restart of the peers) or wait that they naturally roll.

In my case, I'll now be investigating why and how we have these "de-synchronisations".

On a different note, and perhaps not completely relevant, you indicated your hot buckets to have a retention of 91 days. This seems pretty long to me (not double checked the doc on that but still). There is also the warm stage between hot and cold, I would typically have a shorter period for the hot buckets, and keep them warm for a sensible period, before rolling them cold.

High number of Indexer cluster fixup tasks

configuration

troubleshooting

using Splunk Enterprise

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation