python script as unarchive_cmd in props.conf?

robinson · ‎01-10-2011

Hi, I've been strumming the documentation and looking through the answers site, so far unable to come up with a solution to the topic problem. Appreciate any advice!

Working with archived data from remote systems that include output of unix/linux style "iptables -L" command. We want to search the info according to ACCEPTs, src addresses, etc.

Individual lines in the data don't have date/time info or "chain" names, so I wrote a python script that reads stdin and outputs lines with date/time and series of name=value pairs. I hoped to get this working from props.conf with a stanza that looks roughly thus:

[source::.../iptables-log*]
sourcetype = iptables-trafficlog
[iptables-trafficlog]
invalid_cause = archive
unarchive_cmd = python interpret-iptables-eventlog.py

That didn't seem to work much 😞 My hypothesis right now is that input processing isn't finding either the python interpreter or my script. My questions are (1) is what I'm attempting supposed to work? and (2) Where do I deploy my script and how specify its invocation within props.conf? (3) Is there a much simpler or obvious solution that I've overlooked?

thanks so much for your time and attention! --A Newbie

tibevilaqua · ‎05-04-2017

It's necessary to declare some info into the inputs.conf file too.
Example: https://answers.splunk.com/answers/143771/whats-the-trick-to-get-unarchive-cmd-to-work-for-a-custom-...

jjensenyahoo · ‎03-05-2013

I think you may need to use priority to override default unarchiving processors. See my answer if it may be helpful...

jjensenyahoo · ‎03-05-2013

I struggled with this way too long also.
I have a custom access log format that is gzipped. It needs to be gunziped then piped through a custom converter to get to NCSA format (access_combined). No matter what I seemed to do, my log format would seem to get unarchived, but never passed through my converter (even though it seemed to be honoring my source:: spec

i had to do this in local/props.conf:

[source::/path/to/my/special/logs/.../*]
unarchive_cmd = gunzip | my_custom_converter
unarchive_sourcetype = access_combined
NO_BINARY_CHECK = true
priority = 10

The key here seems to be that I had to use priority keyword. I believe this was necessary to override the default gzip unarchiver which seemed to take precedence over whatever custom sourcetype I defined.
I'm sure there is a way to see how this is getting parsed and processed, but it's not really obvious. Full disclosure: I am a complete splunk newbie.

@lowell:
I believe they did have .gz extensions.

Lowell · ‎07-04-2014

jjensenyahoo, I assume your log files ended in with .gz, is that correct?

gkanapathy · ‎02-18-2011

I think you may need to specify the command under the [source::] stanza rather than the sourcetype stanza.

John_Mark · ‎01-21-2011

I don't see much in the way of debugging output here. How do you know it isn't working? What warnings/errors are you seeing? Did you enable the script in your management console? Did you put it in etc/apps/search/bin ?

robinson · ‎01-24-2011

I didn't enable the script in management console. Where would that be done?

I fully specified the path name. Does it need to be in etc/apps/search/bin to be invoked?

robinson · ‎01-24-2011

Thanks for your attention John.

I know it isn't working because (1) data don't get indexed and (2) I put a line in my script to write a line of diagnostic to a fully-qualified path when the script is invoked, which never appears.

I'm not getting warnings or errors, indexing works as expected but the data in sources identified as iptables-log* in props.conf are ignored. So that suggests the "invalid_cause" spec is working. Formerly the data in question had been indexed into a large multi-line event (unusable).

I didn't enable the script in management console? Where would that be done?

robinson · ‎01-21-2011

Hello everybody! Actually, I'm not clear whether anyone but me has looked at this question. Is anybody out there?

Intuition would suggest the problem of invoking a little bespoke preprocessing on data at input time would be a very common thing for managers of real system deployments to want. So, it's hard for me to believe that there isn't some sort of "standard" answer to the problem I've posed. But about two weeks after posting this I've seen no response at all. Is my situation so unusual?

thanks!

robinson · ‎01-11-2011

so sorry about the terrible formatting of the code in that comment :-(. it's just two lines naming the source (file name) and assigning sourcetype.

When that's in my props.conf, I am able to search for the relevant sourcetype, but the result that comes back is one very big event containing the entire (hundred-line plus) listing from iptables -L. Not what I was hoping to get. Any suggestions on an approach are welcome!

robinson · ‎01-11-2011

Since posting this query I've had the chance to try a number of variations on the setup: "wrapping" the python command in an executable script, prepending full path specs to ensure the files can be found, and so on. The result is no joy: It appears that the "unarchive_cmd" specified is never activated. So it suggests maybe I'm taking the wrong approach. I "know" that my monitored data files contain (within some layers of zip/Z/tgz and so on) the iptables-log* contents because this configuration:

[source::.../iptables-log*]
sourcetype = iptables-trafficlog

results in many records

python script as unarchive_cmd in props.conf?

The Payment Operations Wake-Up Call: Why Financial Institutions Can't Afford ...

Make Your Case: A Ready-to-Send Letter for Getting Approval to Attend .conf25

Community Spotlight: A Splunk Expert's Journey

Are you a member of the Splunk Community?

python script as unarchive_cmd in props.conf?

The Payment Operations Wake-Up Call: Why Financial Institutions Can't Afford ...

Make Your Case: A Ready-to-Send Letter for Getting Approval to Attend .conf25

Community Spotlight: A Splunk Expert's Journey