Deployment Architecture

Are bucket corruption and configuration initialization errors related?

elliotproebstel
Champion

We have been getting a lot of errors of this nature lately:

[indexer] Failed to read size=1235 event(s) from rawdata in bucket='my_index~3~66FDB370-3E8C-4495-9F62-60F0490E21DF' path='/opt/splunk/var/lib/splunk/hotwarm/my_index/db/rb_1521159284_1520938932_3_66FDB370-3E8C-4495-9F62-60F0490E21DF. Rawdata may be corrupt, see search.log. Results may be incomplete!

We see that maybe 2-3 times/week in the last month or so. Additionally (and maybe related?), we've been seeing errors of this nature almost every time we run a search for the last few months:

Dispatch Runner: Configuration initialization for /opt/splunk/var/run/searchpeers/my-server-1521808017 took longer than expected (1028ms) when dispatching a search (search ID: remote_my-server_1521808321.22898); this typically reflects underlying storage performance issues
  1. Are these likely to be related?
  2. Regardless of #1 - is there good advice for fixing/avoiding these, other than routinely putting the system into maintenance mode and manually running fixups?
0 Karma
1 Solution

elliotproebstel
Champion

We have determined that these were not related. It turns out that our increase in corrupt bucket errors was actually caused by a Linux OS-level configuration error that was causing our indexers to hard restart unpredictably every day or two. We fixed the underlying issue, and we stopped getting the abundance of corrupt buckets.

View solution in original post

0 Karma

elliotproebstel
Champion

We have determined that these were not related. It turns out that our increase in corrupt bucket errors was actually caused by a Linux OS-level configuration error that was causing our indexers to hard restart unpredictably every day or two. We fixed the underlying issue, and we stopped getting the abundance of corrupt buckets.

0 Karma

dm1
Contributor

@elliotproebstel we are facing exact same issue. Our deployment is on AWS.

Can you please share what was the cause of this issue in your environment and how did you fix it ?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...