Getting Data In

Best way to invoke python script that generates .csv files for lookup

beaumaris
Communicator

I have a python script that retrieves data from an external source and stores it in several .csv files. I have added the necessary information to transforms.conf and savedsearches.conf to use the lookup function in the search to find the data mappings. The .csv files are stored in the apps//lookups directory. This is working as expected. I plan to run the python program once per hour to refresh the data in the .csv files but I'm looking for the recommended way to do this.

Questions: - What is the best way to run the script on a schedule? - Is there a specific entry I should make in savedsearches.conf? Should the script be placed in the apps//bin directory? - Is it advisable to use inputs.conf, send the tables to stdout and have splunk index them directly? (I really only want one copy of the data, it is not time-based) - When performing the lookups, does splunk cache the .csv data? - If the .csv file is updated on the fly, does splunk know to refresh it's internal representation? - Is there a reduction in efficiency if the lookup tables grow very large? I expect 10K-20K rows.

Tags (2)

hazekamp
Builder

beaumaris,

I would recommend setting up a scripted inputs for this in inputs.conf like so:

## inputs.conf
[script://$SPLUNK_HOME/etc/apps/<your_app_here>/bin/<your_script>.py]
disabled = false
## once per week on wednesday; using cron such that search doesn't execute @ start time
interval = 0 0 * * 3

For schedules, you can use an interval specified as # secs between executions, or a chron schedule. I think the approach you are using to generate a .csv and use as a lookup w/in Splunk is the correct one. I don't believe Splunk cache's the .csv data, so contents will be read from disk per invocation. Updates to the .csv should take immediate affect in Splunk. 10k-20k rows should not be a problem. There are considerations for distributed environments as the list will by default be replicated down to the indexers. If the list is interacted w/ via "| lookup" instead of props.conf you can add the csv to distsearch.conf replication blacklist and use "| lookup local=true" which will make the lookup local to your search server.

See also:

http://www.splunk.com/base/Documentation/latest/Admin/Inputsconf

http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...