Getting Data In
Highlighted

Custom script input - How to let Splunk handle files through a custom script that will stream converted data to be indexed

SplunkTrust
SplunkTrust

Hi,

I'm currently working on an application that handles files with a very specific format Splunk cannot directly manage, data has to be converted through a third party script. (currently a perl script)

I would like to adapt the current configuration to let Splunk handle files (based on pattern) and call the 3rd party script which gets the file name as argument.

To sum up, my goal is:

  • Splunk watches for any new or updated file (as for any standard files input)
  • when a new file is available or a CRC file differs, Splunk calls the third party script with the file name as argument
  • The third party script streams the converted data that Splunk will index

I already have a functional third party script that does that job but could yet find the better to proceed as required

Thanks in advance for any help

Tags (2)
0 Karma
Highlighted

Re: Custom script input - How to let Splunk handle files through a custom script that will stream converted data to be indexed

Legend

Did you look into using "unarchive_cmd" for this? It sounds like it could solve your situation, even though you're not strictly "unarchiving" anything, but the principle should still be the same - Splunk detects a change, invokes the script, then ingests the data that the script outputs.

http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* Use _auto for Splunk's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip)
* Defaults to empty.

View solution in original post

Highlighted

Re: Custom script input - How to let Splunk handle files through a custom script that will stream converted data to be indexed

SplunkTrust
SplunkTrust

Nice idea, i'll check and let you know, thanks

0 Karma
Highlighted

Re: Custom script input - How to let Splunk handle files through a custom script that will stream converted data to be indexed

SplunkTrust
SplunkTrust

Ayn,

Thank you very much for your clever suggestion, this indeed did the job as i need 🙂

0 Karma
Highlighted

Re: Custom script input - How to let Splunk handle files through a custom script that will stream converted data to be indexed

SplunkTrust
SplunkTrust

For those who would be intesrested in such as case, here is how i got it to work as i need.

As few links that helped to implement a 3rd party script with the unarchive_cmd stanza:

http://answers.splunk.com/answers/7729/how-to-invoke-unarchive_cmd
http://blogs.splunk.com/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles/
http://answers.splunk.com/answers/10501/python-script-as-unarchive_cmd-in-propsconf

Also i had to adapt my 3rd party script to be able to manage data from stdin instead of the filename as argument (eg. cat | myscript)
Depending on your case and script, you may want your script to stream the converted data that will be directly indexed by Splunk (the simplest) or you may need your script to generate csv file(s) that would be indexed by Splunk. (my case)

The configuration that worked as i need:

props.conf

  1. You need to declare a source stanza associated to your 3rd party script:

    [source::/pathtorawfiles/*.]
    invalidcause = archive
    unarchive
    cmd =
    sourcetype = mysourcetype
    NOBINARYCHECK = true

  2. In my case, my script generates several csv files (standard csv files with header) that Splunk will index, so i declared a second stanza. (you don't need this if your script outputs the data directly)

    [mydatasourcetype]

    FIELDDELIMITER=,
    FIELD
    QUOTE="
    HEADERFIELDLINENUMBER=1
    NO
    BINARYCHECK=1
    INDEXED
    EXTRACTIONS=csv
    KVMODE=none
    SHOULD
    LINEMERGE=false
    pulldown_type=true

inputs.conf

  1. I declare a monitor associated to the raw data that need to be converted through my 3rd party script:

    [monitor:///pathtorawfiles/*.]
    disabled = false
    index = myindex
    sourcetype = mysourcetype

  2. As i my script generates csv files, i just want to index and delete them automatically:

    [batch:///*.csv]
    disabled = false
    move_policy = sinkhole
    recursive = false
    index = myindex
    sourcetype = mydatasourcetype

And that's it, works like a charm 🙂

This off course has to adapted to your requirement.

0 Karma