Getting Data In

How to decompress a single field (compressed JSON file) given the data has already been indexed in Splunk?

New Member

We have a compressed (via python zlib) JSON file that is "chunked" prior to being indexed by Splunk.

The multiple events in Splunk (once indexed) can be pieced together (via Splunk's transaction command) yielding one event, containing multiple fields, one of which contains the compressed JSON file.

How do we decompress this one field in Splunk given the data has already been indexed?

(Decompressing earlier in the process, like during indexing, doesn't seem reasonable because data arrives in pieces due to various size limitations.)


0 Karma

Splunk Employee
Splunk Employee

While Splunk uses zlib for compression internally, that not something made available via commands out of the box.

That said, it does make sense to decompress the data before indexing (as a pre-process) since on the whole it will ALL be compressed again through the indexing process, using the same methodology that you use.

All indexed data is stored as compressed data (and usually sits on disk taking up 30%-70% less room than the raw data).

The other option is for you and yours to create a command that will take input (a field, in line) and run it through a decompression using zlib in a python script. you can read about that here feeding the output back to Splunk where you can use it.

You have not mentioned any specifics regarding why your data "arrives in pieces due to various size limitations", so it's difficult to say whether these suggestions are viable for you.

The least complicated solution would be to create a scripted input (in python, if you like) that decompresses the data as it feeds it to the indexer. (which will, in turn compress and make it available to you simultaneously)

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma