I'd like to inputcsv
a file using a wildcard for part of the file name. How can I accomplish this?
More in-depth description of the problem:
I have weekly VMware CSVs being externally deposited in var/run/splunk/vcenter_reports, e.g.:
9-14-18 100057_000759 VM Inventory - Custom Attributes ABCVC.csv
9-07-18 100723_001163 VM Inventory - Custom Attributes ABCVC.csv
We just want to ingest these CSVs and reference the latest one, which we've done before for CSVs with static formatting after the date:
inputcsv [| makeresults | eval filename=strftime(now()-<modifier>,"vcenter_reports\\%Y-%m-%d_rest_of_static_filename.csv") | return $filename]
But, we aren't able to figure out where the dynamic 100057_000759 part of the filename comes from, and therefore don't know how to inputcsv
the file in Splunk. There are probably a bunch of different ways to do this - how? The easiest would be to inputcsv
the file using a regex if that's possible in Splunk. Or, we could just open the most recent file in the directory, if that's possible with Splunk. I could alternately schedule a report to run once a week that uses a Python script to rename the file, but that seems excessive.
Any suggestions would be appreciated.
How about a shell process that autogenerates a lookup table with the csv file names in it.
The below (adjusted for actual full paths), to list the filepath of the newest csv:
#!/bin/sh
echo filename > SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
ls -t SPLUNKPATH/var/run/splunk/vcenter_reports/*.csv >> SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
This could be executed regularly via cron, yourown custom splunk command, or with something like the command modular input https://splunkbase.splunk.com/app/1553/#/details
Then use your exisitign approach with an iinputlookup:
inputcsv [| inputlookup vcenter_reports.csv | return $filename]
Use following search as a saved search for querying a lookup using wildcard characters
| rest /services/data//lookup-table-files
| fields title
| where title like "%$param$%"
and provide input as follows
| savedsearch searchName param="alpha beta gamma"
Similar to @Colin Humphreys approach, I created a Python script called "recentfile.py" that takes a directory and returns the name of the most recent file. All you have to do is create the file and then edit commands.conf.
# Takes a directory path (home is var/run/splunk) and returns the most recent filename
import glob
import os
import sys,splunk.Intersplunk
def main(results, settings):
list_of_files = glob.glob("C:\\Program Files\\Splunk\\var\\run\\splunk\\"+sys.argv[1]+"\\*")
latest_file = max(list_of_files, key=os.path.getctime)
results = []
result = {}
result['filename'] = latest_file
results.append(result)
return results
results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()
results = main(results, settings)
splunk.Intersplunk.outputResults(results)
How about a shell process that autogenerates a lookup table with the csv file names in it.
The below (adjusted for actual full paths), to list the filepath of the newest csv:
#!/bin/sh
echo filename > SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
ls -t SPLUNKPATH/var/run/splunk/vcenter_reports/*.csv >> SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
This could be executed regularly via cron, yourown custom splunk command, or with something like the command modular input https://splunkbase.splunk.com/app/1553/#/details
Then use your exisitign approach with an iinputlookup:
inputcsv [| inputlookup vcenter_reports.csv | return $filename]
Is your problem that you don't know how to search only in the latest version of the files that you have ingested? Or are you having issues specifying the source
properly because they are always different?
It is possible to get change the source
for the file to eliminate the parts that are different with each new filename, but I don't know if that is one of your issues, but you do state in your posting "and therefore don't know how to access the file in Splunk."
Some additional clarification will certainly help. 🙂
Ah, my problem was that I was trying to use inputcsv ad-hoc in my dashboard to grab the files. Since you can't use wildcards or a regex in inputcsv (as far as I know), I was unable to get files that I needed to wildcard part of the filenames of.
Instead my approach should have been to set up a new monitored input, and then been able to grab the correct source using wildcards. I'll post my code as an answer in a moment, but will probably not accept my own answer, as there may be people out there who are looking to only use inputcsv and use wildcards.
@nick405060 - Go ahead and post your answer and accept it. This solution is the right one for the need. I don't know of any REST call that allows you to check a directory in an ad hoc manner...