Getting Data In

Does anyone know how to change data compression settings in Splunk?

dmacgillivray
Communicator

I can search for compression settings information all day long and currently we only compress at 34% overall (Firebrigade) That seems to be a small number. When I individually search on indexes I do see that compression is higher for some than others.

I am using this query to get an overall understanding of the percentage of compressed values over my indexes.

index=summary orig_index=* | rename orig_index AS index | dedup host, path | search state="warm" | chart   sum(rawSize) AS rawBytes, sum(sizeOnDiskMB) AS diskTotalinMB by index | eval rawTotalinMB=round(rawBytes / 1024 / 1024, 0) | eval comp_percent=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"

I am hoping someone can answer the million dollar question. How do I change this setting that is both advantageous to my data costs but does not hamper indexing speed or searching.

Thanks,
Daniel

Tags (1)
1 Solution

Ayn
Legend

Well I think you kind of answered this yourself - the current setting is already set at this "sweetspot" that balances between efficiency and performance. 34% actually sounds pretty good to me. Remember that the figure you're getting is not just the actual compressed raw data, but also the corresponding metadata that Splunk needs in order to make the data searchable. The compression of the raw data itself is standard gzip and typical figures for this compressed vs uncompressed raw data are around 10% (YMMV depending on data entropy). The metadata actually takes up more storage. There are ways of limiting what metadata Splunk stores, but all these will in the end greatly impact your search performance. My advice would be to just leave the settings as they are.

View solution in original post

mhassan
Path Finder

I believe splunk uses standard zlib library. There is py script you can playing with but I strongly recommend against it. My guess they set the compression ratio to strike a balance between speed vs space.

root:/var/root # locate zlib|grep splunk
/opt/splunk/6.3/lib/python2.7/encodings/zlib_codec.py
/opt/splunk/6.3/lib/python2.7/lib-dynload/zlib.so

0 Karma

Ayn
Legend

Well I think you kind of answered this yourself - the current setting is already set at this "sweetspot" that balances between efficiency and performance. 34% actually sounds pretty good to me. Remember that the figure you're getting is not just the actual compressed raw data, but also the corresponding metadata that Splunk needs in order to make the data searchable. The compression of the raw data itself is standard gzip and typical figures for this compressed vs uncompressed raw data are around 10% (YMMV depending on data entropy). The metadata actually takes up more storage. There are ways of limiting what metadata Splunk stores, but all these will in the end greatly impact your search performance. My advice would be to just leave the settings as they are.

dmacgillivray
Communicator

Thanks Ayn, something tells me you have done this before. So much of this falls on the compression scheme of GZIP. I guess nothing much is going to change that as well as the back end logic Splunk is using here. Ok, time to think about hardware more now 🙂

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...