I've been attempting to get the unarchive_cmd to work with no success. I decided to abandon my ultimate goal for the purpose of troubleshooting and what I'm finding is that I can't get even the most simple use-case to work.
My simplified use case is simple adding support for an "imaginary" gzip-2 format. (Which is really just gzip, with ".gz2" as the extension instead of ".gz"). So I copied Splunk's default props.conf settings for handling "gzip" files and just basically add a "2" to the end of everything. My configuratoin looks like so:
props.conf:
[preprocess-gzip2]
invalid_cause = archive
is_valid = False
LEARN_MODEL = false
[source::....gz2(.\d+)?]
unarchive_cmd = gzip -cd -
sourcetype = preprocess-gzip2
NO_BINARY_CHECK = true
inputs.conf:
[monitor:///tmp/indexme]
I've tried adding unarchive_cmd to the [preprocess-gzip2] with no luck. I also added priority = 90 to [source::....gz2(.\d+)?]. Again no luck. I also tried adding sourcetype=preprocess-gzip2 to the monitor stanza -- no change.
The behavior I'm seeing is this:
I cranked up a few log sources for debugging purposes. Perhaps someone can shed some light on what I'm doing wrong. Again, my long-term goal is to implement some pre-processing command other than gzip -cd -
but I'm just trying to boil this down to the most simple case to get things started.
Splunkd log - First attempt - Loading a .gz2 file [FAILS]
07-04-2014 15:34:21.631 +0000 INFO TailingProcessor - Archive file='/tmp/indexme/test1-custom-handler-gz2.gz2' updated less than 10000ms ago, will not read it until it stops changing.
07-04-2014 15:34:31.632 +0000 INFO TailingProcessor - Archive file='/tmp/indexme/test1-custom-handler-gz2.gz2' has stopped changing, will read it now.
07-04-2014 15:34:31.632 +0000 INFO ArchiveProcessor - handling file=/tmp/indexme/test1-custom-handler-gz2.gz2
07-04-2014 15:34:31.633 +0000 INFO ArchiveProcessor - reading path=/tmp/indexme/test1-custom-handler-gz2.gz2 (seek=0 len=1337)
07-04-2014 15:34:31.633 +0000 DEBUG ArchiveContext - /tmp/indexme/test1-custom-handler-gz2.gz2 is NOT an archive file.
07-04-2014 15:34:31.719 +0000 DEBUG ArchiveProcessor - Attempting to load indexed extractions config from conf=/tmp/indexme/test1-custom-handler-gz2.gz2||preprocess-gzip2| ...
07-04-2014 15:34:31.720 +0000 DEBUG ArchiveContext - /tmp/indexme/test1-custom-handler-gz2.gz2 is NOT an archive file.
07-04-2014 15:34:31.729 +0000 DEBUG ArchiveProcessor - Basing small file initCrc on path=/tmp/indexme/test1-custom-handler-gz2.
07-04-2014 15:34:31.729 +0000 DEBUG ArchiveProcessor - This archive stream's initcrc=0x2e6fa8b616493a17.
07-04-2014 15:34:31.729 +0000 DEBUG ArchiveProcessor - Found no initcrc match for this stream, will re-read entire file.
07-04-2014 15:34:31.729 +0000 DEBUG ArchiveContext - Early termination because write has false notification
07-04-2014 15:34:31.729 +0000 DEBUG ArchiveContext - /tmp/indexme/test1-custom-handler-gz2.gz2 is NOT an archive file.
07-04-2014 15:34:31.747 +0000 DEBUG ArchiveProcessor - fname=/tmp/indexme/test1-custom-handler-gz2.gz2 fishstate=key=0xb5635587184bcff4 sptr=1337 scrc=0x65e7d9f9bdd1b805 fnamecrc=0xba14100c169eff7e modtime=0
07-04-2014 15:34:31.747 +0000 INFO ArchiveProcessor - Finished processing file '/tmp/indexme/test1-custom-handler-gz2.gz2', removing from stats
Splunkd log - Second attempt - Loading a typical .gz file [WORKS]
07-04-2014 15:36:18.722 +0000 INFO TailingProcessor - Archive file='/tmp/indexme/test2-standard-gz-file.gz' updated less than 10000ms ago, will not read it until it stops changing.
07-04-2014 15:36:28.724 +0000 INFO TailingProcessor - Archive file='/tmp/indexme/test2-standard-gz-file.gz' has stopped changing, will read it now.
07-04-2014 15:36:28.725 +0000 INFO ArchiveProcessor - handling file=/tmp/indexme/test2-standard-gz-file.gz
07-04-2014 15:36:28.725 +0000 INFO ArchiveProcessor - reading path=/tmp/indexme/test2-standard-gz-file.gz (seek=0 len=1344)
07-04-2014 15:36:28.725 +0000 DEBUG ArchiveContext - /tmp/indexme/test2-standard-gz-file.gz is NOT an archive file.
07-04-2014 15:36:28.858 +0000 DEBUG ArchiveProcessor - Attempting to load indexed extractions config from conf=/tmp/indexme/test2-standard-gz-file.gz||test2-standard-gz-file-too_small| ...
07-04-2014 15:36:28.858 +0000 DEBUG ArchiveContext - /tmp/indexme/test2-standard-gz-file.gz is NOT an archive file.
07-04-2014 15:36:28.866 +0000 DEBUG ArchiveProcessor - Basing small file initCrc on path=/tmp/indexme/test2-standard-gz-file.
07-04-2014 15:36:28.866 +0000 DEBUG ArchiveProcessor - This archive stream's initcrc=0xcea312efbb705310.
07-04-2014 15:36:28.866 +0000 DEBUG ArchiveProcessor - Found no initcrc match for this stream, will re-read entire file.
07-04-2014 15:36:28.866 +0000 DEBUG ArchiveContext - Early termination because write has false notification
07-04-2014 15:36:28.867 +0000 DEBUG ArchiveContext - /tmp/indexme/test2-standard-gz-file.gz is NOT an archive file.
07-04-2014 15:36:28.878 +0000 INFO databasePartitionPolicy - idx=main Creating hot bucket=hot_v1_3, given event timestamped=1404488177
07-04-2014 15:36:28.879 +0000 INFO DatabaseDirectoryManager - Writing a bucket manifest in hotWarmPath='/opt/splunk/var/lib/splunk/defaultdb/db'. Reason='Adding bucket, bid=main~3~BCB6BEA7-CE1C-486C-BA49-07E40BAB59B1'
07-04-2014 15:36:28.880 +0000 DEBUG ArchiveProcessor - fname=/tmp/indexme/test2-standard-gz-file.gz fishstate=key=0x72a9c05c9c36e767 sptr=1344 scrc=0xb4910955c707a2a2 fnamecrc=0x84c51a934e8fdaab modtime=0
07-04-2014 15:36:28.880 +0000 INFO ArchiveProcessor - Finished processing file '/tmp/indexme/test2-standard-gz-file.gz', removing from stats
Final thoughts
It appears as though the ArchiveContext <FILE> is NOT an archive file
is bogus message since it appears for all archive files; the ones that work and the ones that don't.
The content of the two test files were created using a simple shell command for testing only.
( echo "`date` This is a test event ... ; dd if=/dev/urandom of=/dev/stdout count=1 | xxd ) | gzip -c - > /tmp/indexme/<SOME_FILE_NAME_ENDING_IN_GZ_OR_GZ2>
I'm testing this on Splunk 6.1.1 (splunk-6.1.1-207789.x86_64 rpm) running on CentOS 6.4 using an AWS t1.micro instance.
Update:
I changed my extension from .gz2
to .xy
(randomly chosen) just to make sure I wasn't some hitting the wrong [source:*]
stanza. No change.
I added the following to my local props.conf just to prove to myself that Splunk is actually calling the unarchive_cmd
[source::....(?>!tar.)gz(.\d+)?] unarchive_cmd = gzip -cd - | sed -re 's/secret/HIDDEN/g'And this does work. I also tried this with an invalid command, and a typo in the sed command and all of those issues were logged in splunkd.log.
07-04-2014 16:26:15.102 +0000 ERROR ArchiveContext - From archive='/tmp/indexme/test5-standard-gz-file-with-wacky-archivecommandoverride.gz': /bin/sh: NOT_A_VALID_COMMAND: command not found 07-04-2014 16:27:11.112 +0000 ERROR ArchiveContext - From archive='/tmp/indexme/test5-standard-gz-file-with-wacky-archivecommandoverride.gz': sed: -e expression #1, char 9: unknown command: `*'
I feel like this should work without me having arbitrarily add a .gz to the end of my file names.
I was able to make it work with these files:
I am converting binary data to ascii data with a perl script as the unarchive_cmd
inputs.conf
[monitor:///var/log/huawei/*.cdr]
disabled = false
followTail = 0
host = huawei_switch
index = huawei
sourcetype = huawei_cdr
props.conf
[source::/var/log/huawei/*.cdr]
invalid_cause = archive
unarchive_cmd = /usr/local/bin/huawei2text.pl
sourcetype = huawei_cdr
NO_BINARY_CHECK=true
Caveat 1:
The props.conf file would not work if I used sourcetype [huawei_cdr] instead of the [source::/var/log/huawei/*.cdr] listed above.
Caveat 2:
It would not work with a "known" file type such as .dat or .txt
I found this out by running splunk in debug mode and it set the priority for it's understanding of a .dat file higher than what I was telling it in my props and inputs files. I changed the extention to a .cdr file and it was able to execute the unarchive_cmd and index the binary files properly.
Hope this helps!