Hi,
I've heard comments against configuring Splunk to read gzipped files, horror stories of it not always noticing the file was indeed a gz and logging the compressed raw data instead. I'm looking to piggyback on an existing process that drops a pile of gzipped logs onto a server with a universal forwarder already installed, and don't want to have to delve into custom scripts to first decompress the files to a temp location if there are no genuine known concerns around Splunk's consistent reliability when it comes to indexing gzipped files..
Splunk can read zip/gzip files. Do understand that what Splunk does on the back end is:
1) Unarchives
2) Reads the Files
3) Indexes
4) Deletes the unarchived pieces
Additionally, the unzip process is not multithreaded. So you can see a fair amount of latency and cpu time used when this is done. Especially true if you are trying to monitor a large number of zip files. Also, you have to becareful regarding free disk space..
Splunk can read zip/gzip files. Do understand that what Splunk does on the back end is:
1) Unarchives
2) Reads the Files
3) Indexes
4) Deletes the unarchived pieces
Additionally, the unzip process is not multithreaded. So you can see a fair amount of latency and cpu time used when this is done. Especially true if you are trying to monitor a large number of zip files. Also, you have to becareful regarding free disk space..
That all sounds reasonable as long as it reliable here. These are daily batch files, no manageable delay is really a problem, and it's done overnight when things are relatively sleepy. Where would the files be decompressed to by default?
Ultimately this is a temp hack before we get a real time stream of equivalent data, so looks good all round to me. Thanks
For my understand there is no need to decompress gzip files before indexing it.