Alerting

Can I set a blackout period for a scheduled search, during which it should not generate alerts?

hexx
Splunk Employee
Splunk Employee

I have a scheduled search configured to run every 5 minutes to detect server shutdowns or reboots which may have occurred over the past 5 minutes, and send an email alert if any such events are found. In savedsearches.conf :

[Server shutdowns and reboots detected in the past 5 minutes]
search = source=/var/log/messages ("shutdown" OR "reboot") | stats count by host
dispatch.earliest_time = -6m@m
dispatch.latest_time = -1m@m
enableSched = 1
cron_schedule = */5 * * * *
# Send an email alert if the search returns any events :
counttype = number of events
relation = greater than
quantity = 0
action.email = 1
action.email.to = admin@example.org

The problem is that there is a scheduled reboot of some servers every day between 12:15 AM and 12:45 AM, which generates a false positive email alert.

I would like this scheduled search not to generate any alerts if it detects events in that time frame.

1 Solution

hexx
Splunk Employee
Splunk Employee

There is currently no way to configure a scheduled search not to run or generate alerts during a specific time frame.

If you're lucky, the blackout period you want to introduce is easy enough to bake into the cron format of the "cron_schedule".

This is however not the case with the example discussed here. In our case, we will have to play with the date_hour and date_minute internal fields to exclude events that would normally match the search.

We want only events that match the following criteria :

  • Can be found in source /var/log/messages : source=/var/log/messages
  • Contain the string "shutdown" or the string "reboot" : ("shutdown" OR "reboot")
  • Did not happen at 12AM (date_hour=0), after the 15th minute (date_minute>14) and before the 45th minute (date_minute<46) of that hour : NOT (date\_hour=0 AND date\_minute>14 AND date_minute<46)

The following search string will achieve that goal :

source=/var/log/messages ("shutdown" OR "reboot") NOT (date_hour=0 AND date_minute>14 AND date_minute<46) | stats count by host

View solution in original post

woodcock
Esteemed Legend

OK, one more time.

If this is your requirement:

The alert condition is, it has to run every 15 mins and throw alert if the count of events is 0
BUT the query should not trigger alert between 3:30 AM - 8:30 AM GMT

Then try this:

It is IMPOSSIBLE to have the search NOT RUN the way that you describe. What IS possible is to have it CRASH (and not complete). THAT IS WHAT MY SOLUTION DOES. Just set up the condition to trigger for Number of Results Greater Than 0 and schedule it to run every 15 minutes. My trick operates from INSIDE the search and will cause the search to CRASH (and therefore be IMPOSSIBLE to alert) during the blackout period. Just try it:

 | noop | stats count AS blackoutPeriod | addinfo | eval now=tonumber(strftime(now(),"%H%M")) | eval blackoutPeriod = if(((now>=330) AND (now<=830)),"YES","NO") | eval earliestMaybe=if((blackoutPeriod=="NO"), info_min_time, now()) | map search="search earliest=$earliestMaybe$ latest=$info_max_time$ source=/var/log/messages ("shutdown" OR "reboot") | stats count by host"

sushantnarula
Observer

Hi @woodcock 

What about if i want the blackout period only on Sunday between 2 am to 6 am 
Do you have solution on that.

I want alert "NOT" to be triggered only between 2 am to 6 am every Sunday.

Thanks,
Sushant

0 Karma

IamRoni
Explorer

Thank You woodcock, It worked

0 Karma

woodcock
Esteemed Legend

Try this:

| noop | stats count AS blackoutPeriod | addinfo | eval timeBegin=tonumber(strftime(info_min_time,"%H%M")) | eval timeEnd=tonumber(strftime(info_max_time,"%H%M")) | eval blackoutPeriod = if((((timeBegin>=15) AND (timeBegin<=45)) OR ((timeEnd>=15) AND (timeEnd<=45))),"YES","NO") | eval earliestMaybe=if((blackoutPeriod=="NO"), info_min_time, now()) | map search="search earliest=$earliestMaybe$ latest=$info_max_time$ source=/var/log/messages ("shutdown" OR "reboot") | stats count by host"

The search lacks concision for clarity and educational purposes.
If the search timepicker span (info_min_time->info_max_time) is in the blackout period (00:15<=info_*_time<=00:45), then the search will not run at all. If it is outside of the blackout period, then it will run.
You may have to remove some of the equals-signs (I may have too many) to fit the very edge of the blackout window better.

So is the answer by @hexx good or not (it is showing "Accepted")?

0 Karma

IamRoni
Explorer

Appreciate your help @woodcock
But I am not sure how am i getting stuck.
The requirement is that the query should not trigger alert between 3:30 AM - 8:30 AM
& the alert condition is, it has to run every 15 mins and throw alert if the count of events is 0

@hexx 's answer holds good if the alert condition was any value other than 0(for count of events), since by embedding the NOT command we are restricting the search resulting in 0 value. Which ll still trigger the false alert in this case

I guess I have to accept I am not capable of setting this one or this cant be done which would be very unfortunate 😞
PLEASE HELP!

0 Karma

IamRoni
Explorer

@woodcock @hexx Any help would be appreciated

0 Karma

woodcock
Esteemed Legend

You have not given any feedback as to what is not working in my solution. By my understanding, it should do exactly what you need. If it does not, then comment with EXACTLY what is not working.

0 Karma

IamRoni
Explorer

Hi Woodcock,

I just added your string with my regular search string.
I see that every hour between 15 to 45th min the alert is getting triggered even though we have count.
(The alert is supposed to get triggered only when the count of event is 0)

I can go by the requirement once again if you want

0 Karma

woodcock
Esteemed Legend

The OP says that dead-zone/quiet-time should be every day between 12:15 AM and 12:45 AM.
I do not understand the phrase every hour between 15 to 45th min.
The part of the search that counts events is your code (I copied it from you).
I made no comment on, nor modifications to, the alerting logic (the other parts of your Alert Configuration). If that part is not working correctly, you need to correct that. The configuration (assuming that your core search works like it appears to) should be just what you have in your OP.

Now, having said all that, I suspect that your confusion is due to examining the search jobs. You say that the alert is getting triggered which IS NOT POSSIBLE and I do not believe is true (excepting TZ issues). YES, the search (job) will run during the dead-zone period (5 times at 12:15, 12:30, 12:35, 12:40, and 12:45). HOWEVER the search will not run CORRECTLY during that period. During the blackout, the search job is incapable of generating any results. The changes that I made will cause the search to crash (fail to generate any results at all) due to an Earliest time cannot be greater than latest time error.

0 Karma

IamRoni
Explorer

Hi Woodcock,

I got your point, but my requirement is different than the OP. You might have missed seeing it
Let me put it forward once again:

"The requirement is that the query should not trigger alert between 3:30 AM - 8:30 AM GMT
& the alert condition is, it has to run every 15 mins and throw alert if the count of events is 0"

Can you please assist me on this?

Regards
Sayanto

0 Karma

IamRoni
Explorer

@woodcock any luck on this?

0 Karma

IamRoni
Explorer

@hexx @woodcock

I stumbled upon this answer as my requirement is similar
But I need my query to alert me when the result is 0 other than the black out window

(index=app OR index=silver) environment=prdv source="/cust/app/app-ce-esb/server/app-ce-esb/log/server.log" ShipmentRequest WarehouseId="00150" NOT ("Outbound DCS") NOT (date_hour=3 AND date_minute>=29 AND date_minute<=60) NOT (date_hour=4) NOT (date_hour=5) NOT (date_hour=6) NOT (date_hour=7) NOT (date_hour=8 AND date_minute>=0 AND date_minute<=30)

Since the query is modified to restrict search in the declared time frames the result ll be '0' for those time frames. and since my alert condition is to trigger in case of 0's alert will eventually get triggered.

Can you please suggest a workaround against that?

0 Karma

sideview
SplunkTrust
SplunkTrust

Since you have the alert trigger specifically when there are zero events, you have to modify things here.
Change the alert so that instead of alerting when the search gives zero rows, instead it'll alert "when a custom condition is met.

I would actually modify this search from

source=/var/log/messages ("shutdown" OR "reboot") NOT (date_hour=0 AND date_minute>14 AND date_minute<46) | stats count by host

to
source=/var/log/messages ("shutdown" OR "reboot") | eval is_expected_downtime=if(date_hour=0 AND date_minute>14 AND date_minute<46,"yes","no") | stats values(is_expected_downtime) as is_expected_downtime count by host

and then set as your custom condition to

| stats sum(count) as total values(is_expected_downtime) as is_expected_downtime | search total=0 is_expected_downtime="no"

If the alert fires in a condition where all the data is occurring during expected downtime, then the condition will not match. Outside of expected downtime, it will match whenever there are no raw events.

0 Karma

woodcock
Esteemed Legend

First of all, DO NOT USE the "free" date* fields; create your own instead. See this Q&A for the reason:

https://answers.splunk.com/answers/243017/counting-the-total-number-of-days-for-all-time.html

A more general-purpose solution for this problem that could be used for both the OP's and your (opposite) situations, can be found here:

https://answers.splunk.com/answers/261163/is-there-a-way-i-can-schedule-a-saved-search-to-ru.html

IamRoni
Explorer

Thanks for you suggestion @Woodcock
I tried to blend in your suggestion in my query but still unable to do it
Can you set it for me/if you have any other eg in mind

0 Karma

IamRoni
Explorer

@woodcock Gentle reminder. It would be great if it is possible for you to assist me on this.

0 Karma

hexx
Splunk Employee
Splunk Employee

There is currently no way to configure a scheduled search not to run or generate alerts during a specific time frame.

If you're lucky, the blackout period you want to introduce is easy enough to bake into the cron format of the "cron_schedule".

This is however not the case with the example discussed here. In our case, we will have to play with the date_hour and date_minute internal fields to exclude events that would normally match the search.

We want only events that match the following criteria :

  • Can be found in source /var/log/messages : source=/var/log/messages
  • Contain the string "shutdown" or the string "reboot" : ("shutdown" OR "reboot")
  • Did not happen at 12AM (date_hour=0), after the 15th minute (date_minute>14) and before the 45th minute (date_minute<46) of that hour : NOT (date\_hour=0 AND date\_minute>14 AND date_minute<46)

The following search string will achieve that goal :

source=/var/log/messages ("shutdown" OR "reboot") NOT (date_hour=0 AND date_minute>14 AND date_minute<46) | stats count by host

View solution in original post

dshpritz
SplunkTrust
SplunkTrust

One caveat to this is that the date_* fields are not always present (for example with epoch timestamps or events where the _time is not present in the events themselves). Also, if I remember correctly, the date_ fields do not adjust for timezone differences.

0 Karma

sideview
SplunkTrust
SplunkTrust

quick clarification - in general the date_ fields aren't supposed to be generated when there's no clear TZ info for Splunk to rely on. So indeed without keeping a close eye on your sourcetypes you can't rely on them always being present. Then there is a bug that when time is pulled from an epochtime value (integer number of seconds since 1/1/1970) the date* fields will be present, and will be calculated in GMT.
Here, since the examples posted are all within a known sourcetype, it may very well be fine to use the date_ fields.