Deployment Architecture

Proactively monitor for bucket corruption

jamesoconnell
Path Finder

I just repaired corrupt buckets for a partner index on one of our enterprise indexers.
The issue only became apparent after the customer saw the warnings on their reports.

My question is: are there easy proactive warnings the administrators can receive highlighting index bucket corruption -- rather than leaving it up to our customers to find the problems.

0 Karma
1 Solution

bheemireddi
Communicator

If you are using "monitoring console" that would be a good starting point. It has the visibility into monitoring Indexer clustering activities. Below link might get you started, these are all the dashboards/searches, so may be you can setup the alerts on them. Also on the cluster master settings->indexer clustering might give you some insights too.
https://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Viewindexerclusteringstatus

View solution in original post

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

we can found corrupted buckets from multisite cluster by next search / alert:

index=_internal component=CMMaster state=Discard incoming_bucket_size=* earliest=-30d@d 
| dedup bid 
| table _time,bid,peer_name,existing_bucket_size,incoming_bucket_size
| sort bid,_time

This shows bucket id + source peer.

r. Ismo

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Even this is old case, I would like to add which the one can do with current versions.

Just run this:

| dbinspect index=* OR index=_* corruptonly=true 
| search state!=hot

Select enough long time period to found all corrupted buckets.

r. Ismo 

sloshburch
Splunk Employee
Splunk Employee

A peer of mine shared this search. Does it jive with your environment? I wanna see if we can add these things into the MC as well so I'm curious to hear how you make out.

index=_internal sourcetype=splunkd component=ProcessTracker (BucketBuilder OR JournalSlice) (NOT "rawdata was truncated")
|eval message=replace(message, "^\(child.*?\)\s+", "")
|bin _time span=1m
|stats c by _time, host, splunk_server, message
|fields - c
|rename splunk_server as Indexer, host as Host, message as Issue
0 Karma

jamesoconnell
Path Finder

Thank you Mr. Burch. I tried running this but didn't get any results.

This could either mean that we don't have any bucket issues, or your search isn't worth the paper it is written on -- not sure which.

I'm not sure where the truth lies yet, but I am guessing we must have some bucket issues somewhere given the amount of data we pump each day.

More testing required I think.

thank you!

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Us neither could see any issues with previous search, but there are still couple of corrupted buckets (e.g. journal.gz was only couple of bytes).

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Would you provide more detail on how you identified the buckets were corrupted? That might add color into an existing way to be notified.

0 Karma

jamesoconnell
Path Finder

There was an exclamation symbol / warning on the Dashboard with some cryptic message saying there was an error related to the indexer in question: "[indexer_] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to rawdata offset 0 ..."
This type of error scares the crap out of users and they freak-out to the admin...

0 Karma

bheemireddi
Communicator

If you are using "monitoring console" that would be a good starting point. It has the visibility into monitoring Indexer clustering activities. Below link might get you started, these are all the dashboards/searches, so may be you can setup the alerts on them. Also on the cluster master settings->indexer clustering might give you some insights too.
https://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Viewindexerclusteringstatus

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...