Archive

Select data from the most recent file logged this month over multiple hosts

I have a clustered application running in active/passive configuration. We run a report at the beginning of every month that gathers a whole bunch of stats from the file system and other places. The report is scripted and runs out of cron. The first thing the script does is check if the current node is the active node and, if so, go and gather and write the stats to a directory.

So essentially, for this month, the reports could end up with the path:

node01.example.org:/var/reports/20170301.csv
or
node02.example.org:/var/reports/20170301.csv

depending on which node was the active node.

I basically want to craft a splunk search that will get the logs from either of those files, but only get the most recent file for the current month e.g. March 2017. I would want to exclude previous months. It's also worth noting that the report might have a 2nd or 3rd run if something screwed up the initial reporting run. So a rerun report might be named something like:

node02.example.org:/var/reports/20170302.csv
node02.example.org:/var/reports/20170303.csv

Any ideas what type of search query I could write to do this?

Thanks in advance.

Tags (1)
0 Karma

SplunkTrust
SplunkTrust

This subsearch uses your naming convention to find the latest file name on whatever node.

[metasearch index=foo source=node*.example.org:/var/reports/*.csv | table source 
| rex field=source "(?<nodename>[^\/]*)(?<reptname>.*)" | sort 1 -reptname | table source | format]

It should return a snippet that equals something of the form....

(source="node1.example.org:/var/reports/20170311.csv")

... that you can use in your base search like this ...

index=foo field=whatever  [metasearch index=foo source=node*.example.org:/var/reports/*.csv | table source 
| rex field=source "(?<nodename>[^\/]*)(?<reptname>.*)" | sort 1 -reptname | table source | format]
| then whatever you want to do with the data from that source...
0 Karma

SplunkTrust
SplunkTrust

Try it out:
earliest=@mon latest=now index = source = node*.example.org:/var/reports/*.csv | stats latest(source)
Use verbose mode
you can also probably remove the earliest and latest statement as the stats(latest) will give you results from the last field defined
hope it helps

0 Karma

New Member

this is not working when i am using it with multi index....

(index=abc sourcetype=SUMMARY_CDR_VW Server=CLGRAB1201T ) OR
[ metasearch index=csvlookups source="F:\SplunkMonitor\csvlookups\Core_Network*.csv" earliest=@mon latest=now()
| stats latest(source) as source],

any other way we can achieve this

0 Karma

SplunkTrust
SplunkTrust

Hello there,
you can use earliest and latest like that for example: earliest=@mon latest=now index = <yourIndex> source = node*.example.org:/var/reports/*.csv
or use the month to date on the timepicker

0 Karma

Getting close but this still returns multiple reports that have been generated this month.

e.g

20170301.csv
20170306.csv

Is there a way to extend the query to only get the latest report this month?

0 Karma

Influencer

what is the search you are trying? using a dedup might help

0 Karma

SplunkTrust
SplunkTrust

just use | stats(latest) source
use it in verbose mode so you can see events

0 Karma

New Member

this is not working when i am using it with multi index....

(index=abc sourcetype=SUMMARY_CDR_VW Server=CLGRAB1201T ) OR
[ metasearch index=csvlookups source="F:\SplunkMonitor\csvlookups\Core_Network\*.csv" earliest=@mon latest=now()
| stats latest(source) as source],

any other way we can achieve this

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!