Re: After indexer cluster upgrade to Splunk 6.3, w...

rozmar564 · ‎11-10-2015

We have Splunk Enterprise and our cluster consists of 3 search heads and 9 search peers. After upgrading to version 6.3, the following started to happen.

Although the cluster in total has enough space, certain peers from time to time fill up the disk and the splunkd process dies, pushing the cluster into re-organizing the data. After bringing back the dead peer and waiting for the cluster to be 100% operational (meet its search factor and replication factor) many of the searches produce the following errors :

3 errors occurred while the search was executing. Therefore, search results might be incomplete. Hide errors. [spl003.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0 [spl008.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0 [spl009.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0

I have no clue how to fix this (I could not find any useful info about this on the internet) and the results are incomplete - our business cannot operate correctly as we take decisions based on the analysis we run using Splunk.

Could somebody point me to the right direction?

twollenslegel_s · ‎07-21-2017

I know it won't help anymore, but for reference:

If you are having this issue you may have had a crash or non-clean shutdown and need to repair buckets.

Please take a look at this wiki:
https://wiki.splunk.com/Community:PostCrashFsckRepair

"splunk fsck --all" should show you what buckets are bad, you can either remove them, or try to repair the bucket

Useful options are: --include-hots, --log-to--splunkd-log & --ignore-read-error

USAGE

Supported modes are: scan, repair, clear-bloomfilter, check-integrity, generate-hash-files

:= --one-bucket|--all-buckets-one-index|--all-buckets-all-indexes
[--index-name=] [--bucket-name=] [--bucket-path=]
[--include-hots]
[--local-id=] [--origin-guid=]
[--min-ET=] [--max-LT=]

:= [--try-warm-then-cold] [--log-to--splunkd-log] [--debug] [--v]

fsck repair [--bloomfilter-only]
[--backfill-always|--backfill-never] [--bloomfilter-output-path=]
[--raw-size-only] [--metadata] [--ignore-read-error]

fsck scan [--metadata] [--check-bloomfilter-presence-always]

fsck clear-bloomfilter

fsck check-integrity
fsck generate-hash-files

fsck check-rawdata-format

fsck minify-tsidx --one-bucket --bucket-path= --dont-update-manifest|--home-path=

Notes:
The mode verb 'make-searchable' is synonym for 'repair'.
The mode 'check-integrity' will verify data integrity for buckets created with the integrity-check feature enabled.
The mode 'generate-hash-files' will create or update bucket-level hashes for buckets which were generated with the integrity-check feature enabled.
The mode 'check-rawdata-format' verifies that the journal format is intact for the selected index buckets (the journal is stored in a valid gzip container and has valid journal structure
Flag --log-to--splunkd-log is intended for calls from within splunkd.
If neither --backfill-always nor --backfill-never are given, backfill decisions will be made per indexes.conf 'maxBloomBackfillBucketAge' and 'createBloomfilter' parameters.
Values of 'homePath' and 'coldPath' will always be read from config; if config is not available, use --one-bucket and --bucket-path but not --index-name.
All constraints supplied are implicitly ANDed.
Flag --metadata is only applicable when migrating from 4.2 release.
If giving --include-hots, please recall that hot buckets have no bloomfilters.
Not all argument combinations are valid.
If --help found in any argument position, prints this message & quits.

rozmar564 · ‎02-15-2016

Update: the issue was never resolved, how ever, we don't experience it anymore. We did a DC move in the mean time and we took down the whole cluster for a good few hours, after starting it back up we ended up with a bunch of duplicate buckets that we were able to remove and since then we don't see this issue. Unfortunately time solved it, but no clue what was the root cause 😞

iKate · ‎03-16-2016

It's sad to hear this. We faced the same problem after 6.3 had released, got no response on the issue and just moved to previous version 6.2. Next week we'll try to upgrade again this time to 6.3.3. I'm afraid the same errors will arise but we need new apps that work just with 6.3..
Here was my question https://answers.splunk.com/answers/310778/journalslicedirectory-cannot-seek-to-0-and-error20.html

sgundeti · ‎07-05-2016

any updates on this issue please??

rozmar564 · ‎11-17-2015

Update: I have opened a support case with Splunk Enterprise Support 6 days ago - nobody picked up the support ticket yet... Not cool after paying so much $$$ 😞

asmunde1 · ‎02-15-2016

We are looking into upgrading to 6.3 and would like to make sure we don't experience things like this. Please update the case when you have solved the issues. 🙂

pj_elia · ‎02-12-2016

Any luck? I'm having a similar issue on a search peer but only for a specific index and a specific date range that includes 1 particular day.

t9445 · ‎12-02-2015

any update at this stage? - were seeing this too, typically after a restart (v6.3.1)

rozmar564 · ‎12-02-2015

not yet - we have 2 open tickets with Support open, and I had to upload a diag to them, this was a week ago. since them nothing. I will call our sales rep and ask if they can nudge the support - this is crazy

jkat54 · ‎02-15-2016

You didnt happen to execute as root one time and write a few buckets as root, then switch back to a less privileged user did you? Also found this guy's solution but his problem was a bit different:

https://answers.splunk.com/answers/174669/what-do-i-do-if-rebuilding-a-bucket-fails.html

rozmar564 · ‎02-15-2016

We have 9 indexers, I went through all of them and no mismatching file permissions were found. That is what Splunk support told me to check before they went silent.

After indexer cluster upgrade to Splunk 6.3, why are we getting search error "Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0"?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?