Deployment Architecture

Are bucket corruption and configuration initialization errors related?

We have been getting a lot of errors of this nature lately:

[indexer] Failed to read size=1235 event(s) from rawdata in bucket='my_index~3~66FDB370-3E8C-4495-9F62-60F0490E21DF' path='/opt/splunk/var/lib/splunk/hotwarm/my_index/db/rb_1521159284_1520938932_3_66FDB370-3E8C-4495-9F62-60F0490E21DF. Rawdata may be corrupt, see search.log. Results may be incomplete!

We see that maybe 2-3 times/week in the last month or so. Additionally (and maybe related?), we've been seeing errors of this nature almost every time we run a search for the last few months:

Dispatch Runner: Configuration initialization for /opt/splunk/var/run/searchpeers/my-server-1521808017 took longer than expected (1028ms) when dispatching a search (search ID: remote_my-server_1521808321.22898); this typically reflects underlying storage performance issues
  1. Are these likely to be related?
  2. Regardless of #1 - is there good advice for fixing/avoiding these, other than routinely putting the system into maintenance mode and manually running fixups?
0 Karma
1 Solution

We have determined that these were not related. It turns out that our increase in corrupt bucket errors was actually caused by a Linux OS-level configuration error that was causing our indexers to hard restart unpredictably every day or two. We fixed the underlying issue, and we stopped getting the abundance of corrupt buckets.

View solution in original post

0 Karma

We have determined that these were not related. It turns out that our increase in corrupt bucket errors was actually caused by a Linux OS-level configuration error that was causing our indexers to hard restart unpredictably every day or two. We fixed the underlying issue, and we stopped getting the abundance of corrupt buckets.

View solution in original post

0 Karma