There are lots of tradeoffs and factors to consider here. Splunk Analytics for Hadoop will work with many popular compression codecs utilizing commons-compress-1.10.jar (Splunk 6.6.2)
Here is a list of codecs:
bzip2, gzip, pack200, lzma, xz, Snappy, traditional Unix Compress, DEFLATE, LZ4, Brotli and ar, cpio, jar, tar, zip, dump, 7z, arj
Commons Compress 1.10 On Maven
With that said, I'd suggest you consult with your Hadoop vendor or experiment to see what gives you the best performance for the given compression ratio. One recommendation I am comfortable giving would be to pick a compression codec that is natively splittable in Hadoop like bzip2, Snappy or LZO. I've seen performance issues on non-splittable compression codecs like gzip.
Here's a link from Cloudera on data compression performance for your reference:
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/admin_data_compression_performance.html
... View more