splunkd.log has errors about BTree. I get about 10 messages a second logged in the splunkd.log
ERROR BTree [1001653 IndexerTPoolWorker-3] - 0th child has invalid offset: indexsize=67942584 recordsize=166182200, (Internal)
ERROR BTreeCP [1001653 IndexerTPoolWorker-3] - addUpdate CheckValidException caught: BTree::Exception: Validation failed in checkpoint
I have noticed the btree_index.dat and btree_records.dat are re-created every few seconds. They appear to be copying into the corrupt directory.
I have tried to shutdown splunk and copy snapshot files over, but when I restart splunk they are overwritten and we start the whole loop of files getting created and then copied to corrupt.
I ran a btprobe on the splunk_private_db fishbucket and the output was
no root in /opt/splunk/data/fishbucket/splunk_private_db/btree_index.dat with non-empty recordFile /opt/splunk/data/fishbucket/splunk_private_db/btree_records.dat
recovered key: 0xd3e9c1eb89bdbf3e | sptr=1207
Exception thrown: BTree::Exception: called debug on btree that isn't open!
It is totally possible there is some corruption somewhere. We did have a filesystem issue a while back. I had to do a fsck and there were a few files that I removed. As far as data I can't seem to find out where the problem might be.
In splunk search I appear to have incomplete data in the _internal index. I can't view licensing and Data Quality are empty and have no data.
Any ideas on where to look next?
Currently LM, indexer, SH, and DS are all on the same host. I'm currently using Splunk Enterprise Version: 9.4.0 Build: 6b4ebe426ca6