Getting Data In

Indexing content that may contain in-line gzip

johnamcafee
New Member

We need to index content that may contain in-line gzip (or other compression) content. We do not need to search on the compressed content, but we do need to be able to read that content back out out of Splunk and have it be valid for decompression and display.

I've done some searching through the documentation and knowledge base but have not found any pages that address the topic of gzip content mingled into text log content.

In our case, in the file Splunk is forwarding, we have a message delimiter that we use for our linebreaker, then one line of data that we parse with a REPORT regex, then the content of the message that we are handling. That content, which includes line breaks, usually has some plain-text headers, some other text, then content which might be json, xml, or might be gzip or otherwise compressed something.

We control the writing and use of the content, so for example it would be possible for us to BASE64-encode any binary content before we write it to the log file, then have our application decode it just prior to use - making the log content plain text the rest of the way though.

We would appreciate your advice/recommendations on how best to accomplish this

Tags (2)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

That should be okay. You can stick arbitrary text content into Splunk, though as you suggested, you should base64-encode it. If it's in an extractable field in structured or semi structured content (json, xml), then it would be fine. you'll have to make a few config tweaks in Splunk to ensure clean event breaking and adjust the right max event size, but that's straightforward.

However, because you're not going to be searching on that data, there is no reason for Splunk to index it, and since I am guessing it's of substantial size, it would be very advantageous in disk space and search speed to avoid that. How would you need to search on the content? Would it be just by timestamp, source, host, and sourcetype? Or would you need to be able to search on the non-gzip text of the event? If the former, you can set SEGMENTATION = none for the sourcetype in props.conf. Also, is the gzip stuff intervealed, or all at the end of the searchable free text?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...