Best way to invoke python script that generates .c...

beaumaris · ‎03-31-2011

I have a python script that retrieves data from an external source and stores it in several .csv files. I have added the necessary information to transforms.conf and savedsearches.conf to use the lookup function in the search to find the data mappings. The .csv files are stored in the apps//lookups directory. This is working as expected. I plan to run the python program once per hour to refresh the data in the .csv files but I'm looking for the recommended way to do this.

Questions: - What is the best way to run the script on a schedule? - Is there a specific entry I should make in savedsearches.conf? Should the script be placed in the apps//bin directory? - Is it advisable to use inputs.conf, send the tables to stdout and have splunk index them directly? (I really only want one copy of the data, it is not time-based) - When performing the lookups, does splunk cache the .csv data? - If the .csv file is updated on the fly, does splunk know to refresh it's internal representation? - Is there a reduction in efficiency if the lookup tables grow very large? I expect 10K-20K rows.

hazekamp · ‎03-31-2011

beaumaris,

I would recommend setting up a scripted inputs for this in inputs.conf like so:

## inputs.conf
[script://$SPLUNK_HOME/etc/apps/<your_app_here>/bin/<your_script>.py]
disabled = false
## once per week on wednesday; using cron such that search doesn't execute @ start time
interval = 0 0 * * 3

For schedules, you can use an interval specified as # secs between executions, or a chron schedule. I think the approach you are using to generate a .csv and use as a lookup w/in Splunk is the correct one. I don't believe Splunk cache's the .csv data, so contents will be read from disk per invocation. Updates to the .csv should take immediate affect in Splunk. 10k-20k rows should not be a problem. There are considerations for distributed environments as the list will by default be replicated down to the indexers. If the list is interacted w/ via "| lookup" instead of props.conf you can add the csv to distsearch.conf replication blacklist and use "| lookup local=true" which will make the lookup local to your search server.

Best way to invoke python script that generates .csv files for lookup

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation