While searching for the reason why our indexers are creating so many hot buckets, I executed:
| dbinspect index=* corruptonly=true
and saw that we have many corrupted buckets (> 70).
Most of them are corrupted because of count mismatch tsidx=... source-metadata=...
, but also some with Cannot get slices.dat count
For example:
_internal~171~B8CCBABE-56EF-4E03-9CC1-0E3F674AE341 count mismatch tsidx=2093832 source-metadata=2093800 1556018538 2093800 B8CCBABE-56EF-4E03-9CC1-0E3F674AE341 1 171 _internal 04/23/2019:11:22:29 /opt/splunk/var/lib/splunk/_internaldb/db/hot_v1_171 402303625 232.7734375 14 11 uni-spl-shd-02.livec.sg-cloud.co.uk 1555672065 hot full
What could be the reasons that there are so many corrupted buckets? One of the few infos, I found about bucket corruption, said that it might be caused by an indexer process crash, but we didn't experience one.
Is there a way to get more info about the reason of the corruption.
if you're seeing the corrupt only on hot buckets, you need to change the SPL to retrieve.
|dbinspect index=* corruptonly=true | search state!=hot
Or
|dbinspect index=_internal corruptonly=true | search NOT state=hot
The reason we exclude hot buckets is because the status of those hot buckets are in transient and they are still being updated(written).
Do you have any encryption agent installed on the indexers as part of any security standard to encrypt the sensitive data being indexed?