We have Splunk Enterprise and our cluster consists of 3 search heads and 9 search peers. After upgrading to version 6.3, the following started to happen.
Although the cluster in total has enough space, certain peers from time to time fill up the disk and the splunkd process dies, pushing the cluster into re-organizing the data. After bringing back the dead peer and waiting for the cluster to be 100% operational (meet its search factor and replication factor) many of the searches produce the following errors :
3 errors occurred while the search was executing. Therefore, search results might be incomplete. Hide errors. [spl003.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0 [spl008.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0 [spl009.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0
I have no clue how to fix this (I could not find any useful info about this on the internet) and the results are incomplete - our business cannot operate correctly as we take decisions based on the analysis we run using Splunk.
Could somebody point me to the right direction?
I know it won't help anymore, but for reference:
If you are having this issue you may have had a crash or non-clean shutdown and need to repair buckets.
Please take a look at this wiki:
"splunk fsck --all" should show you what buckets are bad, you can either remove them, or try to repair the bucket
Useful options are: --include-hots, --log-to--splunkd-log & --ignore-read-error
Supported modes are: scan, repair, clear-bloomfilter, check-integrity, generate-hash-files
:= [--try-warm-then-cold] [--log-to--splunkd-log] [--debug] [--v]
fsck repair [--bloomfilter-only]
[--raw-size-only] [--metadata] [--ignore-read-error]
fsck scan [--metadata] [--check-bloomfilter-presence-always]
fsck minify-tsidx --one-bucket --bucket-path= --dont-update-manifest|--home-path=
The mode verb 'make-searchable' is synonym for 'repair'.
The mode 'check-integrity' will verify data integrity for buckets created with the integrity-check feature enabled.
The mode 'generate-hash-files' will create or update bucket-level hashes for buckets which were generated with the integrity-check feature enabled.
The mode 'check-rawdata-format' verifies that the journal format is intact for the selected index buckets (the journal is stored in a valid gzip container and has valid journal structure
Flag --log-to--splunkd-log is intended for calls from within splunkd.
If neither --backfill-always nor --backfill-never are given, backfill decisions will be made per indexes.conf 'maxBloomBackfillBucketAge' and 'createBloomfilter' parameters.
Values of 'homePath' and 'coldPath' will always be read from config; if config is not available, use --one-bucket and --bucket-path but not --index-name.
Flag --metadata is only applicable when migrating from 4.2 release.
If giving --include-hots, please recall that hot buckets have no bloomfilters.
Not all argument combinations are valid.
If --help found in any argument position, prints this message & quits.
Update: the issue was never resolved, how ever, we don't experience it anymore. We did a DC move in the mean time and we took down the whole cluster for a good few hours, after starting it back up we ended up with a bunch of duplicate buckets that we were able to remove and since then we don't see this issue. Unfortunately time solved it, but no clue what was the root cause 😞
It's sad to hear this. We faced the same problem after 6.3 had released, got no response on the issue and just moved to previous version 6.2. Next week we'll try to upgrade again this time to 6.3.3. I'm afraid the same errors will arise but we need new apps that work just with 6.3..
Here was my question https://answers.splunk.com/answers/310778/journalslicedirectory-cannot-seek-to-0-and-error20.html
not yet - we have 2 open tickets with Support open, and I had to upload a diag to them, this was a week ago. since them nothing. I will call our sales rep and ask if they can nudge the support - this is crazy
You didnt happen to execute as root one time and write a few buckets as root, then switch back to a less privileged user did you? Also found this guy's solution but his problem was a bit different: