Splunk Enterprise

High number of Indexer cluster fixup tasks

critchos
Loves-to-Learn Lots

Hi.

We have an indexer cluster of 4 nodes with a little over 100 hundred indexes. We've recently taken a look and the cluster manager fixup tasks and noticed a large number of fixup tasks pending over 100 days (24000) for a select few of the indexes.

The majority of these tasks are for the following reasons.

Received shutdown notification from peer and Cannot replicate as bucket hasn't rolled yet.

For some reason these few indexes are quite low volume but have a large number of buckets.  ideally i would like to clear these tasks.

If we aren't precious about the data would a suitable solution be to remove the indexes from the cluster configuration, manually delete the data folders for the indexes and re enable the indexes?

Or could we reduce the data size on the index/number of buckets on the index to clear out these tasks?

example of one of the index configurations

# staging: 0.01 GB/day, 91 days hot, 304 days cold
[staging]
homePath = /splunkhot/staging/db
coldPath = /splunkcold/staging/colddb
thawedPath = /splunkcold/staging/thaweddb
maxDataSize = 200
frozenTimePeriodInSecs = 34128000
maxHotBuckets = 1
maxWarmDBCount = 300
homePath.maxDataSizeMB = 400
coldPath.maxDataSizeMB = 1000
maxTotalDataSizeMB = 1400


Thanks for any advice.

0 Karma

Alex_LC
Explorer

Hi,

I was looking for an answer to the same problem, and I came across this older post which kind of confirmed my understanding of the issue and the available solutions:

https://community.splunk.com/t5/Deployment-Architecture/Why-is-cluster-master-reporting-quot-Cannot-...

Short summary: hot buckets are streamed from the originating indexer to the other indexers in the cluster, but sometimes they get out of sync for various reasons and the CM starts displaying this type of errors. Two ways to fix it: either roll the buckets (via the GUI on the CM, the API endpoint or by performing a rolling restart of the peers) or wait that they naturally roll.

In my case, I'll now be investigating why and how we have these "de-synchronisations".

On a different note, and perhaps not completely relevant, you indicated your hot buckets to have a retention of 91 days. This seems pretty long to me (not double checked the doc on that but still). There is also the warm stage between hot and cold, I would typically have a shorter period for the hot buckets, and keep them warm for a sensible period, before rolling them cold.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...