Hi Splunk Community, I’m looking for confirmation or guidance on a gzip handling issue with the Splunk Add-on for AWS when ingesting data from Kinesis Firehose → S3 → SQS-based S3 input. Environment Splunk Add-on for AWS version: 7.1.0 (testing upgrade to latest as well) Deployment: Heavy Forwarder Ingestion method: SQS-based S3 input (SDC framework) Source: API Gateway → Kinesis Firehose → S3 Firehose settings: Compression: GZIP Buffer size: 64 MB Data format: Concatenated / “smushed” JSON (e.g. {"a":1}{"b":2}) Observed Issue Files delivered by Firehose are valid concatenated GZIP files (multiple gzip members in a single .gz object). The AWS TA fails to correctly decompress these files: Events appear as binary garbage (\x1f\x8b) Or ingestion stops after the first gzip member If the same file is downloaded from S3 and re-uploaded manually via the AWS Console, Splunk ingests it correctly (single gzip member). Investigation So Far This ingestion path uses splunksdc (splunksdc/aws/s3/archive.py), not Splunkd’s ArchiveProcessor. props.conf settings such as: NO_BINARY_CHECK = true unarchive_cmd = gzip -cd - do not help, as decompression happens upstream in the add-on code. The gzip handling in archive.py appears to use Python’s gzip.GzipFile().read(), which does not fully support concatenated gzip members. _internal logs do not show ArchiveProcessor, confirming SDC path. Questions Is concatenated / multi-member GZIP from Kinesis Firehose officially supported by the Splunk Add-on for AWS SQS-based S3 input? Has this behaviour been fixed in newer versions (8.x)? The release notes don’t explicitly mention concatenated gzip support. Is there a recommended configuration or supported workaround, or is a custom patch / upstream decompression (e.g. Lambda) the only option? Is Splunk planning to align SDC gzip handling with standard multi-member gzip behaviour?
... View more