While running a query via EMR on a bucket archived to s3 with hadoop data roll, I got the following error:
[hadoop] [ip-192-168-4-184] Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider
Does this mean that one of the archived journal.gz files is corrupt? If so:
"Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider" error is because one or more of the archived journal.gz are corrupted.
If splunk suffers crash or an unclean shutdown (power loss, hardware failure, OS failure, etc) then some buckets can be left in a bad state where not all data is searchable. If bucket is corrupted locally on indexer, then archived bucket will also be corrupted.
Local splunk buckets can be fixed by following these instructions : http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/Bucketissues
Currently there is no way to fix corrupted journal.gz that are archived. We are working on fix, that will ensure that we read data from corrupted journal till we hit corrupted part of the journal. We will log error message in search.log suggesting that particular journal is corrupted. This fix will be available in future release.
I have been unable to locate any future updates on this topic ?
We are running 7.2.1 and I would like to know if there is still no way to fix a corrupt archived journal.gz file
Has there been any progress?
I am having this same issue - v7.2.1. Has there been any progress on a fix for this?
hi Gurlest, No update has been provided by Splunk or any of the users from Splunk answers.