Hi all, I need to create an alert to check a folder has 10 files that are created daily.
The tricky bit is the folder name is based on the date.
the complete path is
\\TABASIPP\Prod_Data\appr\data\<today's date>
and in there, we need to check that 10 files are created by 4pm.
the 10 files are
20220530Report1.csv
20220530Report2.csv
Fails_30May2022_checked.csv
20220530_total_submissions.csv
20220530_loss_report.csv
30May2022_EOD.csv
etc...
note that the file names all have the current date, as well as the folder has the date in it as well.
Need to check that the files are created by 4pm, and if not, send us an email alert.
Is this possible with splunk and how would you do this?
thanks for any help in advannce.
Hi @esukkar,
please try something like this:
index=your_index source="\\TABASIPP\Prod_Data\appr\data\*\*.csv" earliest=-24h latest=now
| rex field=source "\\\\TABASIPP\\Prod_Data\\appr\\data\\(?<date>\d{8})\\(?<filename>\w+\.\w+)"
| search date=strftime(now(),"%Y%m%d")
| stats count values(filename) AS filename
| mvexpand filename
| eval
check=case(
substr(filename,1,8)="SUM_VAL_"),substr(filename,1,8),
substr(filename,9,10)="_EOD_Final"),substr(filename,9,10),
substr(filename,1,9)="MER_VALUE_",substr(filename,1,9),
substr(filename,1,10)="CAL_VALUE_",substr(filename,1,10),
substr(filename,9,11)="_CM_REP_VAL",substr(filename,9,11),
substr(filename,9,14)="_IE_EOD_REPORT",substr(filename,9,14),
substr(filename,1,11)="EQ_CAL_REP_",substr(filename,1,11),
substr(filename,10,15)="_CAL_VAL_REPORT",substr(filename,10,15),
substr(filename,10,15)="_EOD_CAL_REPORT",substr(filename,10,15),
substr(filename,9,12)="_PNL_REPORT2",substr(filename,9,12))
| append [ | makeresults | eval check="SUM_VAL_" | fields check ]
| append [ | makeresults | eval check="_EOD_Final" | fields check ]
| append [ | makeresults | eval check="MER_VALUE_" | fields check ]
| append [ | makeresults | eval check="CAL_VALUE_" | fields check ]
| append [ | makeresults | eval check="_CM_REP_VAL" | fields check ]
| append [ | makeresults | eval check="_IE_EOD_REPORT" | fields check ]
| append [ | makeresults | eval check="EQ_CAL_REP_" | fields check ]
| append [ | makeresults | eval check="_CAL_VAL_REPORT" | fields check ]
| append [ | makeresults | eval check="_EOD_CAL_REPORT" | fields check ]
| append [ | makeresults | eval check="_PNL_REPORT2" | fields check ]
| append [ | makeresults | eval check="SUM_VAL_" | fields check ]
| stats count BY check
| eval status=if(count=1,"Nor Present","Present")
| table check status
In this way you have the list of files and the status, if you want an alert, you can add the condition in the last row:
| search status="Not present"
Obviously, I displayed an approach, please check the substrings extractions.
If you like, you can also use regexes instead substring to extract the check strings.
Ciao.
Giuseppe
Hi @esukkar,
I need to know some additional information:
if the second part of the name (the part without date) is always the same and supposing that today's date is a folder with format "yyyy-mm-dd", you could run something like this.
index=your_index source="\\TABASIPP\Prod_Data\appr\data\*\*.csv" earliest=-24h latest=now
| rex field=source "\\\\TABASIPP\\Prod_Data\\appr\\data\\(?<date>\d\d\d\d-\d\d-\d\d)"
| search date=strftime(now(),"%Y-%m-%d")
| stats count values(source) AS source
| where count!=10
| table source
you can check the regex at https://regex101.com/r/56XWd5/1
in this way, if you have results there's an error and the search lists the present files so you can make a visual check for the missing ones, if instead you don't have results, the 10 files are present.
if you want, it's also possible to check the list and identify the missing ones but it's more complicated because the file names have different formats so you have to create a regex for each format.
Ciao.
Giuseppe
Hi gcusello,
thanks for your reply.
you're right, <todays date> is a directory, and changes every day. So the folder I need to check for todays files is 20220602. Tomorrow, it will be 20220603 - so format is YYYYMMDD.
The tricky bit is that the files dont all have the same format. Some files have the date at the start of the file with format YYYYMMDD, while others have the date in the middle of the filename with format either xxxxxxxx-DD-MMM-YYYYY-xxxxxxxxx.csv while a couple dont have date in them at all
there are 10 files in all, so can I just write a rule that lists all 10 of them.
thanks and regards
Hi @esukkar,
ok, changing a few the regex my search should be correct to fire the alert when there are less than 10 files, then you have to manually identify the missing ones.
index=your_index source="\\TABASIPP\Prod_Data\appr\data\*\*.csv" earliest=-24h latest=now
| rex field=source "\\\\TABASIPP\\Prod_Data\\appr\\data\\(?<date>\d{8})"
| search date=strftime(now(),"%Y%m%d")
| stats count values(source) AS source
| where count!=10
| table source
If the ten files have always the same format, you can define a rule for the them and it's also possible to create some regexes to check also the names.
Ciao.
Giuseppe
Hi @gcusello
Are you able to modify the search to specifically locate each of the 10 files. It's because there are more than 10 files in the directory.. more like 50 by 4pm, it's just that if any of these 10 are missing, it's a showstopper. The other files not so important.
So really need something like check for file1 & file2 & file3 & .........& File10, if any file is missing and it's 4pm, send alert..
Thanks and regards
Hi @esukkar,
yes it's possible, but you should share a sample of the ten files format.
In few words, you have to create ten regex to extract the fixed part of the ten files and then check if they are present or not.
Ciao.
Giuseppe
yesterdays directory is
\\TABASIPP\Prod_Data\appr\data\processed\2022-06-01\
The ten files are
SUM_VAL_2002_06_01_9344887.csv
20220601_EOD_Final.csv
MER_VALUE_2022_06_01_9344701.csv
CAL_VALUE_2022_06_01_9344789.csv
20220602_CM_REP_VAL.csv
20220602_IE_EOD_REPORT.csv
EQ_CAL_REP_2022_06_02_9433323.csv
2022_06_02_CAL_VAL_REPORT.csv
2022_06_02_EOD_CAL_REPORT.csv
20220602_PNL_REPORT2.csv
Note that the ones with 7 digit number at end of the filename after the date is a random number you can ignore - I think it's a timestamp. But need to ensure these 10 files are received by 4pm, else send an splunk email alert.
thanks so much
Hi @esukkar,
please try something like this:
index=your_index source="\\TABASIPP\Prod_Data\appr\data\*\*.csv" earliest=-24h latest=now
| rex field=source "\\\\TABASIPP\\Prod_Data\\appr\\data\\(?<date>\d{8})\\(?<filename>\w+\.\w+)"
| search date=strftime(now(),"%Y%m%d")
| stats count values(filename) AS filename
| mvexpand filename
| eval
check=case(
substr(filename,1,8)="SUM_VAL_"),substr(filename,1,8),
substr(filename,9,10)="_EOD_Final"),substr(filename,9,10),
substr(filename,1,9)="MER_VALUE_",substr(filename,1,9),
substr(filename,1,10)="CAL_VALUE_",substr(filename,1,10),
substr(filename,9,11)="_CM_REP_VAL",substr(filename,9,11),
substr(filename,9,14)="_IE_EOD_REPORT",substr(filename,9,14),
substr(filename,1,11)="EQ_CAL_REP_",substr(filename,1,11),
substr(filename,10,15)="_CAL_VAL_REPORT",substr(filename,10,15),
substr(filename,10,15)="_EOD_CAL_REPORT",substr(filename,10,15),
substr(filename,9,12)="_PNL_REPORT2",substr(filename,9,12))
| append [ | makeresults | eval check="SUM_VAL_" | fields check ]
| append [ | makeresults | eval check="_EOD_Final" | fields check ]
| append [ | makeresults | eval check="MER_VALUE_" | fields check ]
| append [ | makeresults | eval check="CAL_VALUE_" | fields check ]
| append [ | makeresults | eval check="_CM_REP_VAL" | fields check ]
| append [ | makeresults | eval check="_IE_EOD_REPORT" | fields check ]
| append [ | makeresults | eval check="EQ_CAL_REP_" | fields check ]
| append [ | makeresults | eval check="_CAL_VAL_REPORT" | fields check ]
| append [ | makeresults | eval check="_EOD_CAL_REPORT" | fields check ]
| append [ | makeresults | eval check="_PNL_REPORT2" | fields check ]
| append [ | makeresults | eval check="SUM_VAL_" | fields check ]
| stats count BY check
| eval status=if(count=1,"Nor Present","Present")
| table check status
In this way you have the list of files and the status, if you want an alert, you can add the condition in the last row:
| search status="Not present"
Obviously, I displayed an approach, please check the substrings extractions.
If you like, you can also use regexes instead substring to extract the check strings.
Ciao.
Giuseppe
Hi @esukkar,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated 😉