Getting Data In

Indexing content that may contain in-line gzip

johnamcafee
New Member

We need to index content that may contain in-line gzip (or other compression) content. We do not need to search on the compressed content, but we do need to be able to read that content back out out of Splunk and have it be valid for decompression and display.

I've done some searching through the documentation and knowledge base but have not found any pages that address the topic of gzip content mingled into text log content.

In our case, in the file Splunk is forwarding, we have a message delimiter that we use for our linebreaker, then one line of data that we parse with a REPORT regex, then the content of the message that we are handling. That content, which includes line breaks, usually has some plain-text headers, some other text, then content which might be json, xml, or might be gzip or otherwise compressed something.

We control the writing and use of the content, so for example it would be possible for us to BASE64-encode any binary content before we write it to the log file, then have our application decode it just prior to use - making the log content plain text the rest of the way though.

We would appreciate your advice/recommendations on how best to accomplish this

Tags (2)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

That should be okay. You can stick arbitrary text content into Splunk, though as you suggested, you should base64-encode it. If it's in an extractable field in structured or semi structured content (json, xml), then it would be fine. you'll have to make a few config tweaks in Splunk to ensure clean event breaking and adjust the right max event size, but that's straightforward.

However, because you're not going to be searching on that data, there is no reason for Splunk to index it, and since I am guessing it's of substantial size, it would be very advantageous in disk space and search speed to avoid that. How would you need to search on the content? Would it be just by timestamp, source, host, and sourcetype? Or would you need to be able to search on the non-gzip text of the event? If the former, you can set SEGMENTATION = none for the sourcetype in props.conf. Also, is the gzip stuff intervealed, or all at the end of the searchable free text?

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...