While searching for the reason why our indexers are creating so many hot buckets, I executed:
| dbinspect index=* corruptonly=true and saw that we have many corrupted buckets (> 70).
Most of them are corrupted because of
count mismatch tsidx=... source-metadata=..., but also some with
Cannot get slices.dat count
_internal~171~B8CCBABE-56EF-4E03-9CC1-0E3F674AE341 count mismatch tsidx=2093832 source-metadata=2093800 1556018538 2093800 B8CCBABE-56EF-4E03-9CC1-0E3F674AE341 1 171 _internal 04/23/2019:11:22:29 /opt/splunk/var/lib/splunk/_internaldb/db/hot_v1_171 402303625 232.7734375 14 11 uni-spl-shd-02.livec.sg-cloud.co.uk 1555672065 hot full
What could be the reasons that there are so many corrupted buckets? One of the few infos, I found about bucket corruption, said that it might be caused by an indexer process crash, but we didn't experience one.
Is there a way to get more info about the reason of the corruption.
if you're seeing the corrupt only on hot buckets, you need to change the SPL to retrieve.
|dbinspect index=* corruptonly=true | search state!=hot
|dbinspect index=_internal corruptonly=true | search NOT state=hot
The reason we exclude hot buckets is because the status of those hot buckets are in transient and they are still being updated(written).