I just repaired corrupt buckets for a partner index on one of our enterprise indexers.
The issue only became apparent after the customer saw the warnings on their reports.
My question is: are there easy proactive warnings the administrators can receive highlighting index bucket corruption -- rather than leaving it up to our customers to find the problems.
If you are using "monitoring console" that would be a good starting point. It has the visibility into monitoring Indexer clustering activities. Below link might get you started, these are all the dashboards/searches, so may be you can setup the alerts on them. Also on the cluster master settings->indexer clustering might give you some insights too.
https://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Viewindexerclusteringstatus
Hi
we can found corrupted buckets from multisite cluster by next search / alert:
index=_internal component=CMMaster state=Discard incoming_bucket_size=* earliest=-30d@d
| dedup bid
| table _time,bid,peer_name,existing_bucket_size,incoming_bucket_size
| sort bid,_time
This shows bucket id + source peer.
r. Ismo
Even this is old case, I would like to add which the one can do with current versions.
Just run this:
| dbinspect index=* OR index=_* corruptonly=true
| search state!=hot
Select enough long time period to found all corrupted buckets.
r. Ismo
A peer of mine shared this search. Does it jive with your environment? I wanna see if we can add these things into the MC as well so I'm curious to hear how you make out.
index=_internal sourcetype=splunkd component=ProcessTracker (BucketBuilder OR JournalSlice) (NOT "rawdata was truncated")
|eval message=replace(message, "^\(child.*?\)\s+", "")
|bin _time span=1m
|stats c by _time, host, splunk_server, message
|fields - c
|rename splunk_server as Indexer, host as Host, message as Issue
Thank you Mr. Burch. I tried running this but didn't get any results.
This could either mean that we don't have any bucket issues, or your search isn't worth the paper it is written on -- not sure which.
I'm not sure where the truth lies yet, but I am guessing we must have some bucket issues somewhere given the amount of data we pump each day.
More testing required I think.
thank you!
Us neither could see any issues with previous search, but there are still couple of corrupted buckets (e.g. journal.gz was only couple of bytes).
Would you provide more detail on how you identified the buckets were corrupted? That might add color into an existing way to be notified.
There was an exclamation symbol / warning on the Dashboard with some cryptic message saying there was an error related to the indexer in question: "[indexer_] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to rawdata offset 0 ..."
This type of error scares the crap out of users and they freak-out to the admin...
If you are using "monitoring console" that would be a good starting point. It has the visibility into monitoring Indexer clustering activities. Below link might get you started, these are all the dashboards/searches, so may be you can setup the alerts on them. Also on the cluster master settings->indexer clustering might give you some insights too.
https://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Viewindexerclusteringstatus