I am trying to index gzipped files that do not have the .gz extension on a window universal forwarder.
First I got the following messages in splunkd.log:
11-18-2019 15:06:33.698 +0100 INFO TailReader - Ignoring file 'D:\path\to\log\messages_xyz' due to: binary 11-18-2019 15:06:33.698 +0100 WARN FileClassifierManager - The file 'D:\path\to\log\messages_xyz' is invalid. Reason: binary.
Looking at how splunk handles gzipped files in props.conf of system/default I tried to put the following props.conf together
[mysourcetype] invalid_cause = archive NO_BINARY_CHECK = true is_valid = False [source::D:\path\to\log\*] #Default #unarchive_cmd = _auto #On linux #unarchive_cmd = gzip -cd - #On windows unarchive_cmd = splunk-compresstool -g
trying out splunk-compresstool seems to work:
.\splunk-compresstool.exe -g 'xyz'
2019-03-27 16:01:34.000 device kern.info kernel: udevd version 124 started 2019-03-27 16:01:34.000 device kern.info kernel: net eth0: eth0: allmulti set 2019-03-27 16:01:34.000 device kern.info kernel: net eth0: eth0: allmulti set 2019-03-27 16:06:44.000 devicekern.warn kernel: JFFS2 warning: (793) jffs2_sum_write_data: Not enough space for summary, padsize = -376
This is what I see in splunkd.log
11-18-2019 16:55:47.351 +0100 INFO ArchiveProcessor - Handling file=xyz 11-18-2019 16:55:47.351 +0100 INFO ArchiveProcessor - reading path=xyz (seek=0 len=211534) 11-18-2019 16:55:47.402 +0100 INFO ArchiveProcessor - Finished processing file 'xyz', removing from stats
And this is what I see in metrics.log
11-18-2019 17:03:47.471 +0100 INFO Metrics - group=per_source_thruput, ingest_pipe=0, series="xyz", kbps=0, eps=0.03224797474898443, kb=0, ev=1, avg_age=0, max_age=0
Although metrics.log says that ev=1 I do not see any events in the index (and there should be more than 1 event per file)
Is there a possibility to see what the ArchiveProcessor is doing?
Shouldn't Splunk just recognize filetypes without depending on the extension?
This is the props that worked for
[source::D:\\path\\to\log\\*] #Default #unarchive_cmd = _auto #On linux #unarchive_cmd = gzip -cd - #On windows unarchive_cmd = splunk-compresstool -g invalid_cause = archive NO_BINARY_CHECK = true is_valid = False
Thanks so much for your post. I am surprised that the following did not work. "_auto" is not the default value meaning setting it "_auto" would make Splunk automatically extract the archived file unless a file extension is required. I am having a case where Splunk is ingesting gzip file without extension but the files after ingestion is not in text format. After testing a file with .gz Splunk recognized it and decompressed it properly. That tells me that Splunk requires an archived file to have an extension.
unarchive_cmd = _auto