Digging through the new stuff in 6.3 in preparation for some upgrades, I see LZ4 compression is available for bucket rawdata journal compression in indexes.conf. Awesome! I'm excited. Splunk bucket data seems like it should be a great fit for LZ4's strengths.
But LZ4 should also incur a measurable hit on storage needs over gzip, and algorithm benchmarks often focus on specific interesting data cases or a broad set of varying data types. Splunk's intake focus is pretty narrow by comparison, so I'm curious to see if anyone has any real-world numbers to throw down yet, since changing to LZ4 should change the calculations for capacity planning.
The only item I've seen on this is that
Bonus finding: LZ4 does not yield any substantial
gains in performance that would be worth the
tradeoff in extra storage vs. GZIP
In the Architecting Splunk for Epic Performance conference talk by the Blizzard Splunk team
The only item I've seen on this is that
Bonus finding: LZ4 does not yield any substantial
gains in performance that would be worth the
tradeoff in extra storage vs. GZIP
In the Architecting Splunk for Epic Performance conference talk by the Blizzard Splunk team
I'm planning to use LZ4 for my current engagement, although the compression will also have the benefit of riding on top Pure All Flash so we should gain the benefit of their dedup. Pure has advised us to use LZ4 to get better dedup rates. I'll post some additional details when we get data ingestion rolling and have some real world numbers I can share. I will not be able to compare it to gzip though as we're not planning to test that nor do we have previous metrics to look at.
@2manyhobbies - We're about to stand up a new installation with Pure as hot/warm and are planning to use LZ4 given their recommendation. How has this been working out for you?