Solved: "Error reading compressed journal while streaming:...

heroku_curzonj · ‎10-28-2016

While running a query via EMR on a bucket archived to s3 with hadoop data roll, I got the following error:

[hadoop] [ip-192-168-4-184] Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider

Does this mean that one of the archived journal.gz files is corrupt? If so:

How can I figure out how it got corrupted?
How do I figure out which one and fix it? This is still in test phase, so I have all the archived buckets on my indexer still. I'm trying to validate that the archival mechanism is safe and reliable.

kpawar_splunk · ‎10-28-2016

"Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider" error is because one or more of the archived journal.gz are corrupted.

If splunk suffers crash or an unclean shutdown (power loss, hardware failure, OS failure, etc) then some buckets can be left in a bad state where not all data is searchable. If bucket is corrupted locally on indexer, then archived bucket will also be corrupted.

Local splunk buckets can be fixed by following these instructions : http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/Bucketissues

Currently there is no way to fix corrupted journal.gz that are archived. We are working on fix, that will ensure that we read data from corrupted journal till we hit corrupted part of the journal. We will log error message in search.log suggesting that particular journal is corrupted. This fix will be available in future release.

View solution in original post

kpawar_splunk · ‎10-28-2016

"Streamed search execute failed because: Error reading compressed journal while streaming: gzip data truncated, provider=StdinGzDataProvider" error is because one or more of the archived journal.gz are corrupted.

If splunk suffers crash or an unclean shutdown (power loss, hardware failure, OS failure, etc) then some buckets can be left in a bad state where not all data is searchable. If bucket is corrupted locally on indexer, then archived bucket will also be corrupted.

Local splunk buckets can be fixed by following these instructions : http://docs.splunk.com/Documentation/Splunk/6.5.0/Indexer/Bucketissues

Currently there is no way to fix corrupted journal.gz that are archived. We are working on fix, that will ensure that we read data from corrupted journal till we hit corrupted part of the journal. We will log error message in search.log suggesting that particular journal is corrupted. This fix will be available in future release.

gurlest · ‎05-10-2019

I am having this same issue - v7.2.1. Has there been any progress on a fix for this?

pbrinkman · ‎05-13-2019

hi Gurlest, No update has been provided by Splunk or any of the users from Splunk answers.

pbrinkman · ‎12-13-2018

Hi,

I have been unable to locate any future updates on this topic ?
We are running 7.2.1 and I would like to know if there is still no way to fix a corrupt archived journal.gz file

Cheers
Paul

jmantor · ‎12-26-2018

Has there been any progress?

"Error reading compressed journal while streaming: gzip data truncated". Are my Hadoop archived buckets corrupted, and how do I fix it?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!