Splunk Enterprise

What is the problem with splunkd recover-metada --handle-roll processes - until it crashes splunk?

Loves-to-Learn Lots

Hello everyone,

I recently migrated from and old running hardware to a newer hardware and started to index the same that that I was indexing before (njmon - json version of nmon).

In the old infra, same Splunk version running on both environments, it has no big issues and it has lower capacity than in the new one. Also, in the newer one, we're using faster disks (nmvme).

After 1-3 days injecting njmon data, the indexer crashes and during this time, I can see a lot of splunkd recover-metada processes and also splunkd fsck --log-to--splunkd-log repair processes:

[root@splunk]# ps -ef | grep splunkd
splunk 21828 16396 99 12:00 ? 02:59:44 splunkd fsck --log-to--splunkd-log repair --try-warm-then-cold --one-bucket --index-name=njmon --bucket-name=db_1683179867_1683170671_48 --bloomfilter-only
splunk 21829 21828 0 12:00 ? 00:00:00 splunkd fsck --log-to--splunkd-log repair --try-warm-then-cold --one-bucket --index-name=njmon --bucket-name=db_1683179867_1683170671_48 --bloomfilter-only
splunk 41284 16396 99 12:20 ? 02:40:30 splunkd recover-metadata /net/splunk/fs0/splunk-hotwarm/njmon/db/db_1683195586_1683179630_51 --handle-roll njmon /net/splunk/fs0/splunk-hotwarm/njmon/db/db_1683195586_1683179630_51 --write-level 4 --tsidx-target-size 1572864000 --msidx-comp-block-size 1024
splunk 80067 16396 99 12:59 ? 02:00:53 splunkd recover-metadata /net/splunk/fs0/splunk-hotwarm/njmon/db/db_1683197989_1683180366_54 --handle-roll njmon /net/splunk/fs0/splunk-hotwarm/njmon/db/db_1683197989_1683180366_54 --write-level 4 --tsidx-target-size 1572864000 --msidx-comp-block-size 1024
splunk 136806 16396 99 13:44 ? 01:16:45 splunkd recover-metadata /net/splunk/fs0/splunk-hotwarm/njmon/db/db_1683200654_1683180434_53 --handle-roll njmon /net/splunk/fs0/splunk-hotwarm/njmon/db/db_1683200654_1683180434_53 --write-level 4 --tsidx-target-size 1572864000 --msidx-comp-block-size 1024


The server is running RHEL 8, 128 RAM, 48 Physical Procs, 96 Logical.

Splunk Version: Splunk 8.2.10 (build 417e74d5c950)


The difference between old infra and new one is the tsidxlevel. In the old infra, we're using 2, in the newer one, we're using 4, but all the environments using the data are using the version greater or equal to 8.2.


Any hints from the community?

Labels (1)
Tags (2)
0 Karma
Get Updates on the Splunk Community!

Build Scalable Security While Moving to Cloud - Guide From Clayton Homes

 Clayton Homes faced the increased challenge of strengthening their security posture as they went through ...

Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...