Getting Data In
Provide Splunk Cloud feedback in this confidential UX survey by June 17
for a chance to win a $200 Amazon gift card!

Comparing number of sources vs number of sources on previous day

mic1024
Path Finder

I'm monitoring a directory and need to alert when at certain time of the day (let's say at 7am) the number of files is the same as the number of files ON THE PREVIOUS working day (however my working days are Tue to Sat).
If I'm missing files (comparing to previous day) I'd like an alert for it (and ideally list the missing files).

So far I went with following approach:

sourcetype = logs earliest=-0d@d latest=now | multikv | eval ReportKey="today"| append [ search sourcetype = logs earliest=-d@d latest=-1d@d | multikv | eval ReportKey="previous"] | chart eval(dc(source)) AS NumberOfFiles by ReportKey

That gives me table (chart) similar to the below:

ReportKey NumberOfFiles
previous 51
today 50

So in this scenario I'm missing 1 file, however:

I'm not sure how to alert on it (advanced custom condition?).
It also doesn't list the missing file(s).
Also I run into problem on certain days(around weekends), as files are being generated Tue to Sat only (so the above time range [earliest=-d@d latest=-1d@d] won't work on Mon and Tue).

I have a feeling my approach isn't the way to go. Any suggestions?

Thanks,

Tags (2)
0 Karma

somesoni2
Revered Legend

If I understand correctly your requirement
1. From Wednesday to Saturday, compare count of source with previous day and if there are missing source today (need name as well), configure an alert for the same.
2. For Tuesday, compare count of source with Saturday and do same alerting.
3. For Sunday and Monday, no comparison and no alert

My simple solution to this will be
1. Create a saved search which will run daily (at your specified time) with following search

sourcetype=logs earliest=-3d@d latest=now | stats count by source,date_wday | fields - count
| eval today=lower(strftime(now(),"%A")) | eval yest=lower(strftime(relative_time(now(), "-1d@d"),"%A"))
| eval prevWDay=case(today="tuesday","saturday",today="sunday" OR today="monday","ShouldNotRun",1=1,yest) 
| where date_wday=prevWDay | table source,prevWDay | join type=outer source 
[search sourcetype=logs earliest=@d latest=now |eval filter=case(date_wday="sunday" OR date_wday="monday","Y",1=1,"N")
| where filter="N"| stats count by source,date_wday | fields - count | rename date_wday as today] 
| eval today=coalesce(today,"Missing Source") | where today="Missing Source"

About this query, first part get list of sources for previous day (if today is tuesday, get saturday, nothing for monday and sunday, and for others, get for previous day.
second part, gets data for today if its not sunday or monday. Yesterday's results are outer joined with todays to get source which were present yest but not today (marked as "Missing Source").

An alert should be configured on this when there are any rows returned by the search. Hope this helps.

0 Karma

mic1024
Path Finder

Hi,
the requirements you listed are correct.
I like how you used word 'simple' and then 'just' wrote 11 lines search query 😉
I'll test it today and will come back with my findings/comments.
thanks!

lguinn2
Legend

I like @jtrucks solution to have multiple searches running, that's a creative idea and pretty simple.

But this search may do what you want as well

sourcetype=logs
| bucket _time span=d
| eval DOW = strftime(_time,"%w")
| where DOW > 1
| stats dc(source) as fileCount by _time
| delta fileCount as Diff
| where Diff != 0

Alert on number of events > 0

mic1024
Path Finder

Hi,
It seems that the search does what it's supposed to be doing - which is amazing.

Thanks very much!
mic.

0 Karma

jtrucks
Splunk Employee
Splunk Employee

Your subsearch should use earliest=-1d@d latest=-d@d to have the relative date order correct.

You could compare date_wday to see if it is sunday or tuesday to not pull data for those days in this manner. Perhaps schedule one search to run monday and wednesday through saturday and another search to run on sunday and tuesday. The sunday and tuesday searches could compare the last day that could be similar, which would be comparing last monday to this sunday and last saturday to this tuesday. Then run the normal search on monday and wednesday through saturday as they will be comparing similar days against one another.

Also, you can simplify your above operation by using this (which doesn't get today or previous as day names, but it's more explicit to show the exact day name:

sourcetype=logs  earliest=-1d@d latest=now |stats dc(source) AS NumberOfFiles by date_wday

That does not, yet, solve the problem of reporting the difference, but it's faster and gets part way there.

mic1024
Path Finder

so it seems that I don't know how to do it, if someone want's to have a shoot - be my guest! 😉

0 Karma

mic1024
Path Finder

Hi MuS, I have not. back to the drawing board then! (I'm not familiar with streamstats so gotta lookup the docs ;).

I'll post my finding here (if any).

0 Karma

MuS
SplunkTrust
SplunkTrust

have you tried using streamstats instead of the sub search? Also, you can setup the saved search cron like meaning you run different searches on Mon then on Sat.

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!