Getting Data In

Index gzipped files without .gz extension

chris
Motivator

Hi,

I am trying to index gzipped files that do not have the .gz extension on a window universal forwarder.

First I got the following messages in splunkd.log:

11-18-2019 15:06:33.698 +0100 INFO  TailReader - Ignoring file 'D:\path\to\log\messages_xyz' due to: binary
11-18-2019 15:06:33.698 +0100 WARN FileClassifierManager - The file 'D:\path\to\log\messages_xyz' is invalid. Reason: binary.

Looking at how splunk handles gzipped files in props.conf of system/default I tried to put the following props.conf together

[mysourcetype]
invalid_cause = archive
NO_BINARY_CHECK = true
is_valid = False

[source::D:\path\to\log\*]
#Default
#unarchive_cmd = _auto
#On linux
#unarchive_cmd = gzip -cd -
#On windows
unarchive_cmd = splunk-compresstool -g

trying out splunk-compresstool seems to work:
.\splunk-compresstool.exe -g 'xyz'

2019-03-27 16:01:34.000 device kern.info kernel: udevd version 124 started
2019-03-27 16:01:34.000 device kern.info kernel: net eth0: eth0: allmulti set
2019-03-27 16:01:34.000 device kern.info kernel: net eth0: eth0: allmulti set
2019-03-27 16:06:44.000 devicekern.warn kernel: JFFS2 warning: (793) jffs2_sum_write_data: Not enough space for summary, padsize = -376

This is what I see in splunkd.log

11-18-2019 16:55:47.351 +0100 INFO  ArchiveProcessor - Handling file=xyz
11-18-2019 16:55:47.351 +0100 INFO  ArchiveProcessor - reading path=xyz (seek=0 len=211534)
11-18-2019 16:55:47.402 +0100 INFO  ArchiveProcessor - Finished processing file 'xyz', removing from stats

And this is what I see in metrics.log

11-18-2019 17:03:47.471 +0100 INFO  Metrics - group=per_source_thruput, ingest_pipe=0, series="xyz", kbps=0, eps=0.03224797474898443, kb=0, ev=1, avg_age=0, max_age=0

Although metrics.log says that ev=1 I do not see any events in the index (and there should be more than 1 event per file)

Is there a possibility to see what the ArchiveProcessor is doing?

Shouldn't Splunk just recognize filetypes without depending on the extension?

Regards Chris

0 Karma

chris
Motivator

This is the props that worked for

 [source::D:\\path\\to\log\\*]
 #Default
 #unarchive_cmd = _auto
 #On linux
 #unarchive_cmd = gzip -cd -
 #On windows
 unarchive_cmd = splunk-compresstool -g
 invalid_cause = archive
 NO_BINARY_CHECK = true
 is_valid = False

anwarmian
Communicator

Thanks so much for your post.  I am surprised that the following did not work.  "_auto" is not the default value meaning setting it  "_auto" would make Splunk automatically extract the archived file unless a file extension is required.  I am having a case where Splunk is ingesting gzip file without extension but the files after ingestion is not in text format.  After testing a file with .gz Splunk recognized it and decompressed it properly.  That tells me that Splunk requires an archived file to have an extension.

unarchive_cmd = _auto

 

0 Karma

chris
Motivator

Turns out that i forgot to escape the \ in the win path in props.conf

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...