All Apps and Add-ons

Splunk Add-on for AWS: Does SQS-based S3 input support Firehose concatenated GZIP files

alphablue
New Member

Hi Splunk Community,

I’m looking for confirmation or guidance on a gzip handling issue with the Splunk Add-on for AWS when ingesting data from Kinesis Firehose → S3 → SQS-based S3 input.

Environment

  • Splunk Add-on for AWS version: 7.1.0 (testing upgrade to latest as well)

  • Deployment: Heavy Forwarder

  • Ingestion method: SQS-based S3 input (SDC framework)

  • Source: API Gateway → Kinesis Firehose → S3

  • Firehose settings:

    • Compression: GZIP

    • Buffer size: 64 MB

  • Data format: Concatenated / “smushed” JSON (e.g. {"a":1}{"b":2})

Observed Issue

  • Files delivered by Firehose are valid concatenated GZIP files (multiple gzip members in a single .gz object).

  • The AWS TA fails to correctly decompress these files:

    • Events appear as binary garbage (\x1f\x8b)

    • Or ingestion stops after the first gzip member

  • If the same file is downloaded from S3 and re-uploaded manually via the AWS Console, Splunk ingests it correctly (single gzip member).

Investigation So Far

  • This ingestion path uses splunksdc (splunksdc/aws/s3/archive.py), not Splunkd’s ArchiveProcessor.

  • props.conf settings such as:

    • NO_BINARY_CHECK = true

    • unarchive_cmd = gzip -cd -
      do not help, as decompression happens upstream in the add-on code.

  • The gzip handling in archive.py appears to use Python’s gzip.GzipFile().read(), which does not fully support concatenated gzip members.

  • _internal logs do not show ArchiveProcessor, confirming SDC path.

Questions

  1. Is concatenated / multi-member GZIP from Kinesis Firehose officially supported by the Splunk Add-on for AWS SQS-based S3 input?

  2. Has this behaviour been fixed in newer versions (8.x)?
    The release notes don’t explicitly mention concatenated gzip support.

  3. Is there a recommended configuration or supported workaround, or is a custom patch / upstream decompression (e.g. Lambda) the only option?

  4. Is Splunk planning to align SDC gzip handling with standard multi-member gzip behaviour?

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @alphablue 

Can you confirm what you specified as the s3_file_decoder for the SQS-based-S3 input? Is this set to CustomLogs?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...