Splunk Search

Inputcsv using a regex for the filename?

nick405060
Motivator

I'd like to inputcsv a file using a wildcard for part of the file name. How can I accomplish this?

More in-depth description of the problem:

I have weekly VMware CSVs being externally deposited in var/run/splunk/vcenter_reports, e.g.:

9-14-18 100057_000759 VM Inventory - Custom Attributes ABCVC.csv
9-07-18 100723_001163 VM Inventory - Custom Attributes ABCVC.csv

We just want to ingest these CSVs and reference the latest one, which we've done before for CSVs with static formatting after the date:

inputcsv [| makeresults | eval filename=strftime(now()-<modifier>,"vcenter_reports\\%Y-%m-%d_rest_of_static_filename.csv") | return $filename]

But, we aren't able to figure out where the dynamic 100057_000759 part of the filename comes from, and therefore don't know how to inputcsv the file in Splunk. There are probably a bunch of different ways to do this - how? The easiest would be to inputcsv the file using a regex if that's possible in Splunk. Or, we could just open the most recent file in the directory, if that's possible with Splunk. I could alternately schedule a report to run once a week that uses a Python script to rename the file, but that seems excessive.

Any suggestions would be appreciated.

Tags (3)
0 Karma
1 Solution

datasearchninja
Communicator

How about a shell process that autogenerates a lookup table with the csv file names in it.

The below (adjusted for actual full paths), to list the filepath of the newest csv:

#!/bin/sh
echo filename > SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
ls -t SPLUNKPATH/var/run/splunk/vcenter_reports/*.csv >> SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv

This could be executed regularly via cron, yourown custom splunk command, or with something like the command modular input https://splunkbase.splunk.com/app/1553/#/details

Then use your exisitign approach with an iinputlookup:

inputcsv [| inputlookup  vcenter_reports.csv | return $filename]

View solution in original post

grv97
New Member

Use following search as a saved search for querying a lookup using wildcard characters

| rest /services/data//lookup-table-files
| fields title
| where title like "%$param$%"

and provide input as follows

| savedsearch searchName param="alpha beta gamma"

Tags (1)
0 Karma

nick405060
Motivator

Similar to @Colin Humphreys approach, I created a Python script called "recentfile.py" that takes a directory and returns the name of the most recent file. All you have to do is create the file and then edit commands.conf.

# Takes a directory path (home is var/run/splunk) and returns the most recent filename

import glob
import os
import sys,splunk.Intersplunk

def main(results, settings):
    list_of_files = glob.glob("C:\\Program Files\\Splunk\\var\\run\\splunk\\"+sys.argv[1]+"\\*")
    latest_file = max(list_of_files, key=os.path.getctime)
    results = []
    result = {}
    result['filename'] = latest_file
    results.append(result)
    return results

results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()
results = main(results, settings)
splunk.Intersplunk.outputResults(results)
0 Karma

datasearchninja
Communicator

How about a shell process that autogenerates a lookup table with the csv file names in it.

The below (adjusted for actual full paths), to list the filepath of the newest csv:

#!/bin/sh
echo filename > SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv
ls -t SPLUNKPATH/var/run/splunk/vcenter_reports/*.csv >> SPLUNKPATH/etc/apps/my_app/lookups/vcenter_reports.csv

This could be executed regularly via cron, yourown custom splunk command, or with something like the command modular input https://splunkbase.splunk.com/app/1553/#/details

Then use your exisitign approach with an iinputlookup:

inputcsv [| inputlookup  vcenter_reports.csv | return $filename]

cpetterborg
SplunkTrust
SplunkTrust

Is your problem that you don't know how to search only in the latest version of the files that you have ingested? Or are you having issues specifying the source properly because they are always different?

It is possible to get change the source for the file to eliminate the parts that are different with each new filename, but I don't know if that is one of your issues, but you do state in your posting "and therefore don't know how to access the file in Splunk."

Some additional clarification will certainly help. 🙂

0 Karma

nick405060
Motivator

Ah, my problem was that I was trying to use inputcsv ad-hoc in my dashboard to grab the files. Since you can't use wildcards or a regex in inputcsv (as far as I know), I was unable to get files that I needed to wildcard part of the filenames of.

Instead my approach should have been to set up a new monitored input, and then been able to grab the correct source using wildcards. I'll post my code as an answer in a moment, but will probably not accept my own answer, as there may be people out there who are looking to only use inputcsv and use wildcards.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

@nick405060 - Go ahead and post your answer and accept it. This solution is the right one for the need. I don't know of any REST call that allows you to check a directory in an ad hoc manner...

0 Karma
Get Updates on the Splunk Community!

Introducing Ingest Actions: Filter, Mask, Route, Repeat

WATCH NOW Ingest Actions (IA) is the best new way to easily filter, mask and route your data in Splunk® ...

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...