Splunk Search

Inputcsv using a regex for the filename?

Motivator

I'd like to inputcsv a file using a wildcard for part of the file name. How can I accomplish this?

More in-depth description of the problem:

I have weekly VMware CSVs being externally deposited in var/run/splunk/vcenter_reports, e.g.:

9-14-18 100057_000759 VM Inventory - Custom Attributes ABCVC.csv
9-07-18 100723_001163 VM Inventory - Custom Attributes ABCVC.csv

We just want to ingest these CSVs and reference the latest one, which we've done before for CSVs with static formatting after the date:

inputcsv [| makeresults | eval filename=strftime(now()-<modifier>,"vcenter_reports\\%Y-%m-%d_rest_of_static_filename.csv") | return $filename]

But, we aren't able to figure out where the dynamic 100057_000759 part of the filename comes from, and therefore don't know how to inputcsv the file in Splunk. There are probably a bunch of different ways to do this - how? The easiest would be to inputcsv the file using a regex if that's possible in Splunk. Or, we could just open the most recent file in the directory, if that's possible with Splunk. I could alternately schedule a report to run once a week that uses a Python script to rename the file, but that seems excessive.

Any suggestions would be appreciated.

Tags (3)
0 Karma
1 Solution

Communicator

How about a shell process that autogenerates a lookup table with the csv file names in it.

The below (adjusted for actual full paths), to list the filepath of the newest csv:

#!/bin/sh
echo filename > SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
ls -t SPLUNKPATH/var/run/splunk/vcenter_reports/*.csv >> SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv

This could be executed regularly via cron, yourown custom splunk command, or with something like the command modular input https://splunkbase.splunk.com/app/1553/#/details

Then use your exisitign approach with an iinputlookup:

inputcsv [| inputlookup  vcenter_reports.csv | return $filename]

View solution in original post

Motivator

Similar to @Colin Humphreys approach, I created a Python script called "recentfile.py" that takes a directory and returns the name of the most recent file. All you have to do is create the file and then edit commands.conf.

# Takes a directory path (home is var/run/splunk) and returns the most recent filename

import glob
import os
import sys,splunk.Intersplunk

def main(results, settings):
    list_of_files = glob.glob("C:\\Program Files\\Splunk\\var\\run\\splunk\\"+sys.argv[1]+"\\*")
    latest_file = max(list_of_files, key=os.path.getctime)
    results = []
    result = {}
    result['filename'] = latest_file
    results.append(result)
    return results

results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()
results = main(results, settings)
splunk.Intersplunk.outputResults(results)
0 Karma

Communicator

How about a shell process that autogenerates a lookup table with the csv file names in it.

The below (adjusted for actual full paths), to list the filepath of the newest csv:

#!/bin/sh
echo filename > SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
ls -t SPLUNKPATH/var/run/splunk/vcenter_reports/*.csv >> SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv

This could be executed regularly via cron, yourown custom splunk command, or with something like the command modular input https://splunkbase.splunk.com/app/1553/#/details

Then use your exisitign approach with an iinputlookup:

inputcsv [| inputlookup  vcenter_reports.csv | return $filename]

View solution in original post

SplunkTrust
SplunkTrust

Is your problem that you don't know how to search only in the latest version of the files that you have ingested? Or are you having issues specifying the source properly because they are always different?

It is possible to get change the source for the file to eliminate the parts that are different with each new filename, but I don't know if that is one of your issues, but you do state in your posting "and therefore don't know how to access the file in Splunk."

Some additional clarification will certainly help. 🙂

0 Karma

Motivator

Ah, my problem was that I was trying to use inputcsv ad-hoc in my dashboard to grab the files. Since you can't use wildcards or a regex in inputcsv (as far as I know), I was unable to get files that I needed to wildcard part of the filenames of.

Instead my approach should have been to set up a new monitored input, and then been able to grab the correct source using wildcards. I'll post my code as an answer in a moment, but will probably not accept my own answer, as there may be people out there who are looking to only use inputcsv and use wildcards.

0 Karma

SplunkTrust
SplunkTrust

@nick405060 - Go ahead and post your answer and accept it. This solution is the right one for the need. I don't know of any REST call that allows you to check a directory in an ad hoc manner...

0 Karma