Deployment Architecture

"Error reading compressed journal while streaming: gzip data truncated". Are my Hadoop archived buckets corrupted, and how do I fix it?

Explorer

While running a query via EMR on a bucket archived to s3 with hadoop data roll, I got the following error:

[hadoop] [ip-192-168-4-184] Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider

Does this mean that one of the archived journal.gz files is corrupt? If so:

  • How can I figure out how it got corrupted?
  • How do I figure out which one and fix it? This is still in test phase, so I have all the archived buckets on my indexer still. I'm trying to validate that the archival mechanism is safe and reliable.
0 Karma
1 Solution

Splunk Employee
Splunk Employee

"Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider" error is because one or more of the archived journal.gz are corrupted.

If splunk suffers crash or an unclean shutdown (power loss, hardware failure, OS failure, etc) then some buckets can be left in a bad state where not all data is searchable. If bucket is corrupted locally on indexer, then archived bucket will also be corrupted.

Local splunk buckets can be fixed by following these instructions : http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/Bucketissues

Currently there is no way to fix corrupted journal.gz that are archived. We are working on fix, that will ensure that we read data from corrupted journal till we hit corrupted part of the journal. We will log error message in search.log suggesting that particular journal is corrupted. This fix will be available in future release.

View solution in original post

Splunk Employee
Splunk Employee

"Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider" error is because one or more of the archived journal.gz are corrupted.

If splunk suffers crash or an unclean shutdown (power loss, hardware failure, OS failure, etc) then some buckets can be left in a bad state where not all data is searchable. If bucket is corrupted locally on indexer, then archived bucket will also be corrupted.

Local splunk buckets can be fixed by following these instructions : http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/Bucketissues

Currently there is no way to fix corrupted journal.gz that are archived. We are working on fix, that will ensure that we read data from corrupted journal till we hit corrupted part of the journal. We will log error message in search.log suggesting that particular journal is corrupted. This fix will be available in future release.

View solution in original post

Path Finder

I am having this same issue - v7.2.1. Has there been any progress on a fix for this?

0 Karma

New Member

hi Gurlest, No update has been provided by Splunk or any of the users from Splunk answers.

0 Karma

New Member

Hi,

I have been unable to locate any future updates on this topic ?
We are running 7.2.1 and I would like to know if there is still no way to fix a corrupt archived journal.gz file

Cheers
Paul

0 Karma

Path Finder

Has there been any progress?

0 Karma