How to invoke unarchive_cmd?

hulahoop · ‎10-08-2010

I'm trying to set a custom archive processor. Is this still supported in Splunk 4.1?

The documentation is contradictory. From props.conf.spec, the 2 parameters which both need to be set are invalid_cause and unarchive_cmd. The descriptions say invalid_cause can only be set for a sourcetype stanza, whereas unarchive_cmd can only be set for a source stanza. Is that even possible?

invalid_cause = <string>
* Can only be set for a [<sourcetype>] stanza.
* Splunk does not index any data with invalid_cause set.
* Set <string> to "archive" to send the file to the archive processor (specified in unarchive_cmd).
* Set to any other string to throw an error in the splunkd.log if running Splunklogger in debug mode.
* Defaults to empty.

is_valid = true | false
* Automatically set by invalid_cause.
* DO NOT SET THIS.
* Defaults to true.

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive". This field is only valid on [source::stanzas].
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* Use _auto for Splunk's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip)
* Defaults to empty.

I can't get the archive processor to activate. Has anyone does this successfully?

ustun · ‎08-16-2011

Seems to be an old post but for those who are looking for it.. The purpose was to read some binary logs using archive processor. This configuration worked:

props.conf:

[source::/path/to/log/directories/...log]
invalid_cause = archive
unarchive_cmd = executable_to_read_binary
sourcetype = binary_log
NO_BINARY_CHECK = true

[default]
maxDist = 500

inputs.conf:

[monitor:///path/to/log/directories]
sourcetype = binary_log

not sure sourcetype is mandatory to get this working. I was able to use invalid_cause under source::. Actually this is the only way it works for me.

Lowell · ‎10-08-2010

I looked through system/default/props.conf and it appears that you simply have to have your source-based stanza point to a custom/bogus sourcetype, which is where you set invalid_cause = archive.

I think an example may make more sense then the paragraph above.

[source::....(tbz|tbz2)(.\d+)?]
unarchive_cmd = _auto
sourcetype = preprocess-bzip
NO_BINARY_CHECK = true

[source::....bz2?(.\d+)?]
unarchive_cmd = bzip2 -cd -
sourcetype = preprocess-bzip
NO_BINARY_CHECK = true

[preprocess-bzip]
invalid_cause = archive
is_valid = False
LEARN_MODEL = false

What I don't get is this: What's the need for all the different "preprocess-*" sourcetypes? I mean, why not just create a single [preprocess-archive] (or something like that) and then just point all the [source::...*] stuff to a single sourcetype. All of the preprocess-* sourcetype are identical in the system default file. I don't think you ever see these sourcetypes within splunk, do you?

How to invoke unarchive_cmd?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!