We have Splunk Enterprise and our cluster consists of 3 search heads and 9 search peers. After upgrading to version 6.3, the following started to happen.
Although the cluster in total has enough space, certain peers from time to time fill up the disk and the splunkd process dies, pushing the cluster into re-organizing the data. After bringing back the dead peer and waiting for the cluster to be 100% operational (meet its search factor and replication factor) many of the searches produce the following errors :
3 errors occurred while the search was executing. Therefore, search results might be incomplete. Hide errors. [spl003.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0 [spl008.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0 [spl009.ayisnap.com] Streamed search execute failed because: JournalSliceDirectory: Cannot seek to 0
I have no clue how to fix this (I could not find any useful info about this on the internet) and the results are incomplete - our business cannot operate correctly as we take decisions based on the analysis we run using Splunk.
Could somebody point me to the right direction?
I know it won't help anymore, but for reference:
If you are having this issue you may have had a crash or non-clean shutdown and need to repair buckets.
Please take a look at this wiki:
https://wiki.splunk.com/Community:PostCrashFsckRepair
"splunk fsck --all" should show you what buckets are bad, you can either remove them, or try to repair the bucket
Useful options are: --include-hots, --log-to--splunkd-log & --ignore-read-error
USAGE
Supported modes are: scan, repair, clear-bloomfilter, check-integrity, generate-hash-files
:= --one-bucket|--all-buckets-one-index|--all-buckets-all-indexes
[--index-name=
[--include-hots]
[--local-id=
[--min-ET=
:= [--try-warm-then-cold] [--log-to--splunkd-log] [--debug] [--v]
fsck repair [--bloomfilter-only]
[--backfill-always|--backfill-never] [--bloomfilter-output-path=
[--raw-size-only] [--metadata] [--ignore-read-error]
fsck scan [--metadata] [--check-bloomfilter-presence-always]
fsck clear-bloomfilter
fsck check-integrity
fsck generate-hash-files
fsck check-rawdata-format
fsck minify-tsidx --one-bucket --bucket-path= --dont-update-manifest|--home-path=
Notes:
The mode verb 'make-searchable' is synonym for 'repair'.
The mode 'check-integrity' will verify data integrity for buckets created with the integrity-check feature enabled.
The mode 'generate-hash-files' will create or update bucket-level hashes for buckets which were generated with the integrity-check feature enabled.
The mode 'check-rawdata-format' verifies that the journal format is intact for the selected index buckets (the journal is stored in a valid gzip container and has valid journal structure
Flag --log-to--splunkd-log is intended for calls from within splunkd.
If neither --backfill-always nor --backfill-never are given, backfill decisions will be made per indexes.conf 'maxBloomBackfillBucketAge' and 'createBloomfilter' parameters.
Values of 'homePath' and 'coldPath' will always be read from config; if config is not available, use --one-bucket and --bucket-path but not --index-name.
All
Flag --metadata is only applicable when migrating from 4.2 release.
If giving --include-hots, please recall that hot buckets have no bloomfilters.
Not all argument combinations are valid.
If --help found in any argument position, prints this message & quits.
Update: the issue was never resolved, how ever, we don't experience it anymore. We did a DC move in the mean time and we took down the whole cluster for a good few hours, after starting it back up we ended up with a bunch of duplicate buckets that we were able to remove and since then we don't see this issue. Unfortunately time solved it, but no clue what was the root cause 😞
It's sad to hear this. We faced the same problem after 6.3 had released, got no response on the issue and just moved to previous version 6.2. Next week we'll try to upgrade again this time to 6.3.3. I'm afraid the same errors will arise but we need new apps that work just with 6.3..
Here was my question https://answers.splunk.com/answers/310778/journalslicedirectory-cannot-seek-to-0-and-error20.html
any updates on this issue please??
Update: I have opened a support case with Splunk Enterprise Support 6 days ago - nobody picked up the support ticket yet... Not cool after paying so much $$$ 😞
We are looking into upgrading to 6.3 and would like to make sure we don't experience things like this. Please update the case when you have solved the issues. 🙂
Any luck? I'm having a similar issue on a search peer but only for a specific index and a specific date range that includes 1 particular day.
any update at this stage? - were seeing this too, typically after a restart (v6.3.1)
not yet - we have 2 open tickets with Support open, and I had to upload a diag to them, this was a week ago. since them nothing. I will call our sales rep and ask if they can nudge the support - this is crazy
You didnt happen to execute as root one time and write a few buckets as root, then switch back to a less privileged user did you? Also found this guy's solution but his problem was a bit different:
https://answers.splunk.com/answers/174669/what-do-i-do-if-rebuilding-a-bucket-fails.html
We have 9 indexers, I went through all of them and no mismatching file permissions were found. That is what Splunk support told me to check before they went silent.