Getting Data In

Best way to invoke python script that generates .csv files for lookup

beaumaris
Communicator

I have a python script that retrieves data from an external source and stores it in several .csv files. I have added the necessary information to transforms.conf and savedsearches.conf to use the lookup function in the search to find the data mappings. The .csv files are stored in the apps//lookups directory. This is working as expected. I plan to run the python program once per hour to refresh the data in the .csv files but I'm looking for the recommended way to do this.

Questions: - What is the best way to run the script on a schedule? - Is there a specific entry I should make in savedsearches.conf? Should the script be placed in the apps//bin directory? - Is it advisable to use inputs.conf, send the tables to stdout and have splunk index them directly? (I really only want one copy of the data, it is not time-based) - When performing the lookups, does splunk cache the .csv data? - If the .csv file is updated on the fly, does splunk know to refresh it's internal representation? - Is there a reduction in efficiency if the lookup tables grow very large? I expect 10K-20K rows.

Tags (2)

hazekamp
Builder

beaumaris,

I would recommend setting up a scripted inputs for this in inputs.conf like so:

## inputs.conf
[script://$SPLUNK_HOME/etc/apps/<your_app_here>/bin/<your_script>.py]
disabled = false
## once per week on wednesday; using cron such that search doesn't execute @ start time
interval = 0 0 * * 3

For schedules, you can use an interval specified as # secs between executions, or a chron schedule. I think the approach you are using to generate a .csv and use as a lookup w/in Splunk is the correct one. I don't believe Splunk cache's the .csv data, so contents will be read from disk per invocation. Updates to the .csv should take immediate affect in Splunk. 10k-20k rows should not be a problem. There are considerations for distributed environments as the list will by default be replicated down to the indexers. If the list is interacted w/ via "| lookup" instead of props.conf you can add the csv to distsearch.conf replication blacklist and use "| lookup local=true" which will make the lookup local to your search server.

See also:

http://www.splunk.com/base/Documentation/latest/Admin/Inputsconf

http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!