Hello all,
upfront: first time Splunk user here, be patient with me 🙂
I've a scenario I would like to describe and which I require some comments on in regards to how this can be archived with Splunk.
Scenario:
- I have a PERL script which is generating data from a target API (stdout or file) on a daily basis - the script requires to be executed with a parameter and only retrieves data with a timestamp that is newer than the value of the supplied parameter
- now I want to forward this CSV formated output into Splunk
- tha data integrity shall be handled by relying on the information which is stored in Splunk (the highest timestamp value stored)
Generally I'm not sure how to assure that splunk does not create duplicates for this data.
Current approach/idea:
1) a daily routine in Splunk is triggered (I assume that would be the job of the forwarder)
2) this input routine checks for the highest timestamp value currently stored in the Splunk index, passes this information towards the PERL script and executes it
3) Splunk takes the output from the PERL script (stdout or file) and feeds it into the index
Does the approach sound reasonable? I'm uncertain how to archive the logic described in 2) - I was thinking about firing the module up as script:// but I'm uncertain how to pass the timestamp value stored in the index. As an alternative I was thinking about just dumping the whole information from the API each time and afterwards somehow filter for data which already was indexed. What can I do to implement a logic for validating for duplicated data?
Any recommendation or pointing in the right direction would be appreciated.
Cheers!
... View more