Hi,
I'm currently working on an application that handles files with a very specific format Splunk cannot directly manage, data has to be converted through a third party script. (currently a perl script)
I would like to adapt the current configuration to let Splunk handle files (based on pattern) and call the 3rd party script which gets the file name as argument.
To sum up, my goal is:
I already have a functional third party script that does that job but could yet find the better to proceed as required
Thanks in advance for any help
Did you look into using "unarchive_cmd" for this? It sounds like it could solve your situation, even though you're not strictly "unarchiving" anything, but the principle should still be the same - Splunk detects a change, invokes the script, then ingests the data that the script outputs.
http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf
unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* Use _auto for Splunk's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip)
* Defaults to empty.
For those who would be intesrested in such as case, here is how i got it to work as i need.
As few links that helped to implement a 3rd party script with the unarchive_cmd stanza:
http://answers.splunk.com/answers/7729/how-to-invoke-unarchive_cmd
http://blogs.splunk.com/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles/
http://answers.splunk.com/answers/10501/python-script-as-unarchive_cmd-in-propsconf
Also i had to adapt my 3rd party script to be able to manage data from stdin instead of the filename as argument (eg. cat
Depending on your case and script, you may want your script to stream the converted data that will be directly indexed by Splunk (the simplest) or you may need your script to generate csv file(s) that would be indexed by Splunk. (my case)
The configuration that worked as i need:
props.conf
You need to declare a source stanza associated to your 3rd party script:
[source::/pathtorawfiles/*.
invalid_cause = archive
unarchive_cmd =
sourcetype = mysourcetype
NO_BINARY_CHECK = true
In my case, my script generates several csv files (standard csv files with header) that Splunk will index, so i declared a second stanza. (you don't need this if your script outputs the data directly)
[mydatasourcetype]
FIELD_DELIMITER=,
FIELD_QUOTE="
HEADER_FIELD_LINE_NUMBER=1
NO_BINARY_CHECK=1
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
pulldown_type=true
inputs.conf
I declare a monitor associated to the raw data that need to be converted through my 3rd party script:
[monitor:///pathtorawfiles/*.
disabled = false
index = myindex
sourcetype = mysourcetype
As i my script generates csv files, i just want to index and delete them automatically:
[batch://
disabled = false
move_policy = sinkhole
recursive = false
index = myindex
sourcetype = mydatasourcetype
And that's it, works like a charm 🙂
This off course has to adapted to your requirement.
Did you look into using "unarchive_cmd" for this? It sounds like it could solve your situation, even though you're not strictly "unarchiving" anything, but the principle should still be the same - Splunk detects a change, invokes the script, then ingests the data that the script outputs.
http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf
unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* Use _auto for Splunk's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip)
* Defaults to empty.
Ayn,
Thank you very much for your clever suggestion, this indeed did the job as i need 🙂
Nice idea, i'll check and let you know, thanks