The other day a few alerts surfaced showing I had 6 large windows data buckets stuck "Fixup Task - In Progress".
I ran a query
| dbinspect index=windows corruptonly=true
| search bucketId IN (windows~nnnn~guid,...)
| fields bucketId, path, splunk_server, corruptReason, state
and found all the primary db_<buckets> from the alerts were corrupt. You can also see it on the IDXCM bucket status.
I tried a few fsck repairs commands on the indexers where the primary buckets resided, but it failed due to error >>> failReason=No bloomfilter
then I tried >>>
./splunk fsck repair --one-bucket --bucket-path=/<path> --index-name=<indexName> --debug --v --backfill-never
After that it cleared and splunkd.log showed >>> Successfully released lock for bucket with path...
I hope this information helps.
The actual steps are:
1 find the corrupted bucket location with the dbinspect query
2 enable maintenance-mode on the IDXCM
3 take the indexer offline (where you want to repair the bucket)
4 run the fsck repair command on the stopped indexer
5 start the indexer when finished
6 disable maintenance-mode on the IDXCM
7 let the IDXCluster heal
8 repeat steps for the next bucket
Large buckets 10G take about 25min to repair
Goodluck
The actual steps are:
1 find the corrupted bucket location with the dbinspect query
2 enable maintenance-mode on the IDXCM
3 take the indexer offline (where you want to repair the bucket)
4 run the fsck repair command on the stopped indexer
5 start the indexer when finished
6 disable maintenance-mode on the IDXCM
7 let the IDXCluster heal
8 repeat steps for the next bucket
Large buckets 10G take about 25min to repair
Goodluck