Getting Data In

Does anyone know how to change data compression settings in Splunk?

dmacgillivray
Communicator

I can search for compression settings information all day long and currently we only compress at 34% overall (Firebrigade) That seems to be a small number. When I individually search on indexes I do see that compression is higher for some than others.

I am using this query to get an overall understanding of the percentage of compressed values over my indexes.

index=summary orig_index=* | rename orig_index AS index | dedup host, path | search state="warm" | chart   sum(rawSize) AS rawBytes, sum(sizeOnDiskMB) AS diskTotalinMB by index | eval rawTotalinMB=round(rawBytes / 1024 / 1024, 0) | eval comp_percent=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"

I am hoping someone can answer the million dollar question. How do I change this setting that is both advantageous to my data costs but does not hamper indexing speed or searching.

Thanks,
Daniel

Tags (1)
1 Solution

Ayn
Legend

Well I think you kind of answered this yourself - the current setting is already set at this "sweetspot" that balances between efficiency and performance. 34% actually sounds pretty good to me. Remember that the figure you're getting is not just the actual compressed raw data, but also the corresponding metadata that Splunk needs in order to make the data searchable. The compression of the raw data itself is standard gzip and typical figures for this compressed vs uncompressed raw data are around 10% (YMMV depending on data entropy). The metadata actually takes up more storage. There are ways of limiting what metadata Splunk stores, but all these will in the end greatly impact your search performance. My advice would be to just leave the settings as they are.

View solution in original post

mhassan
Path Finder

I believe splunk uses standard zlib library. There is py script you can playing with but I strongly recommend against it. My guess they set the compression ratio to strike a balance between speed vs space.

root:/var/root # locate zlib|grep splunk
/opt/splunk/6.3/lib/python2.7/encodings/zlib_codec.py
/opt/splunk/6.3/lib/python2.7/lib-dynload/zlib.so

0 Karma

Ayn
Legend

Well I think you kind of answered this yourself - the current setting is already set at this "sweetspot" that balances between efficiency and performance. 34% actually sounds pretty good to me. Remember that the figure you're getting is not just the actual compressed raw data, but also the corresponding metadata that Splunk needs in order to make the data searchable. The compression of the raw data itself is standard gzip and typical figures for this compressed vs uncompressed raw data are around 10% (YMMV depending on data entropy). The metadata actually takes up more storage. There are ways of limiting what metadata Splunk stores, but all these will in the end greatly impact your search performance. My advice would be to just leave the settings as they are.

dmacgillivray
Communicator

Thanks Ayn, something tells me you have done this before. So much of this falls on the compression scheme of GZIP. I guess nothing much is going to change that as well as the back end logic Splunk is using here. Ok, time to think about hardware more now 🙂

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...