I have a scheduled search configured to run every 5 minutes to detect server shutdowns or reboots which may have occurred over the past 5 minutes, and send an email alert if any such events are found. In savedsearches.conf :
[Server shutdowns and reboots detected in the past 5 minutes]
search = source=/var/log/messages ("shutdown" OR "reboot") | stats count by host
dispatch.earliest_time = -6m@m
dispatch.latest_time = -1m@m
enableSched = 1
cron_schedule = */5 * * * *
# Send an email alert if the search returns any events :
counttype = number of events
relation = greater than
quantity = 0
action.email = 1
action.email.to = admin@example.org
The problem is that there is a scheduled reboot of some servers every day between 12:15 AM and 12:45 AM, which generates a false positive email alert.
I would like this scheduled search not to generate any alerts if it detects events in that time frame.
There is currently no way to configure a scheduled search not to run or generate alerts during a specific time frame.
If you're lucky, the blackout period you want to introduce is easy enough to bake into the cron format of the "cron_schedule".
This is however not the case with the example discussed here. In our case, we will have to play with the date_hour and date_minute internal fields to exclude events that would normally match the search.
We want only events that match the following criteria :
The following search string will achieve that goal :
source=/var/log/messages ("shutdown" OR "reboot") NOT (date_hour=0 AND date_minute>14 AND date_minute<46) | stats count by host
OK, one more time.
If this is your requirement:
The alert condition is, it has to run every 15 mins and throw alert if the count of events is 0
BUT the query should not trigger alert between 3:30 AM - 8:30 AM GMT
Then try this:
It is IMPOSSIBLE to have the search NOT RUN the way that you describe. What IS possible is to have it CRASH (and not complete). THAT IS WHAT MY SOLUTION DOES. Just set up the condition to trigger for Number of Results Greater Than 0
and schedule it to run every 15 minutes
. My trick operates from INSIDE the search and will cause the search to CRASH (and therefore be IMPOSSIBLE to alert) during the blackout period. Just try it:
| noop | stats count AS blackoutPeriod | addinfo | eval now=tonumber(strftime(now(),"%H%M")) | eval blackoutPeriod = if(((now>=330) AND (now<=830)),"YES","NO") | eval earliestMaybe=if((blackoutPeriod=="NO"), info_min_time, now()) | map search="search earliest=$earliestMaybe$ latest=$info_max_time$ source=/var/log/messages ("shutdown" OR "reboot") | stats count by host"
Hi @woodcock
What about if i want the blackout period only on Sunday between 2 am to 6 am
Do you have solution on that.
I want alert "NOT" to be triggered only between 2 am to 6 am every Sunday.
Thanks,
Sushant
Thank You woodcock, It worked
Try this:
| noop | stats count AS blackoutPeriod | addinfo | eval timeBegin=tonumber(strftime(info_min_time,"%H%M")) | eval timeEnd=tonumber(strftime(info_max_time,"%H%M")) | eval blackoutPeriod = if((((timeBegin>=15) AND (timeBegin<=45)) OR ((timeEnd>=15) AND (timeEnd<=45))),"YES","NO") | eval earliestMaybe=if((blackoutPeriod=="NO"), info_min_time, now()) | map search="search earliest=$earliestMaybe$ latest=$info_max_time$ source=/var/log/messages ("shutdown" OR "reboot") | stats count by host"
The search lacks concision for clarity and educational purposes.
If the search timepicker span (info_min_time->info_max_time) is in the blackout period (00:15<=info_*_time<=00:45), then the search will not run at all. If it is outside of the blackout period, then it will run.
You may have to remove some of the equals-signs (I may have too many) to fit the very edge of the blackout window better.
So is the answer by @hexx good or not (it is showing "Accepted")?
Appreciate your help @woodcock
But I am not sure how am i getting stuck.
The requirement is that the query should not trigger alert between 3:30 AM - 8:30 AM
& the alert condition is, it has to run every 15 mins and throw alert if the count of events is 0
@hexx 's answer holds good if the alert condition was any value other than 0(for count of events), since by embedding the NOT command we are restricting the search resulting in 0 value. Which ll still trigger the false alert in this case
I guess I have to accept I am not capable of setting this one or this cant be done which would be very unfortunate 😞
PLEASE HELP!
@woodcock @hexx Any help would be appreciated
You have not given any feedback as to what is not working in my solution. By my understanding, it should do exactly what you need. If it does not, then comment with EXACTLY what is not working.
Hi Woodcock,
I just added your string with my regular search string.
I see that every hour between 15 to 45th min the alert is getting triggered even though we have count.
(The alert is supposed to get triggered only when the count of event is 0)
I can go by the requirement once again if you want
The OP says that dead-zone/quiet-time should be every day between 12:15 AM and 12:45 AM
.
I do not understand the phrase every hour between 15 to 45th min
.
The part of the search that counts events is your code (I copied it from you).
I made no comment on, nor modifications to, the alerting logic (the other parts of your Alert Configuration). If that part is not working correctly, you need to correct that. The configuration (assuming that your core search works like it appears to) should be just what you have in your OP.
Now, having said all that, I suspect that your confusion is due to examining the search jobs. You say that the alert is getting triggered
which IS NOT POSSIBLE and I do not believe is true (excepting TZ issues). YES, the search (job) will run during the dead-zone period (5 times at 12:15, 12:30, 12:35, 12:40, and 12:45). HOWEVER the search will not run CORRECTLY during that period. During the blackout, the search job is incapable of generating any results. The changes that I made will cause the search to crash (fail to generate any results at all) due to an Earliest time cannot be greater than latest time
error.
Hi Woodcock,
I got your point, but my requirement is different than the OP. You might have missed seeing it
Let me put it forward once again:
"The requirement is that the query should not trigger alert between 3:30 AM - 8:30 AM GMT
& the alert condition is, it has to run every 15 mins and throw alert if the count of events is 0"
Can you please assist me on this?
Regards
Sayanto
@woodcock any luck on this?
@hexx @woodcock
I stumbled upon this answer as my requirement is similar
But I need my query to alert me when the result is 0 other than the black out window
(index=app OR index=silver) environment=prdv source="/cust/app/app-ce-esb/server/app-ce-esb/log/server.log" ShipmentRequest WarehouseId="00150" NOT ("Outbound DCS") NOT (date_hour=3 AND date_minute>=29 AND date_minute<=60) NOT (date_hour=4) NOT (date_hour=5) NOT (date_hour=6) NOT (date_hour=7) NOT (date_hour=8 AND date_minute>=0 AND date_minute<=30)
Since the query is modified to restrict search in the declared time frames the result ll be '0' for those time frames. and since my alert condition is to trigger in case of 0's alert will eventually get triggered.
Can you please suggest a workaround against that?
Since you have the alert trigger specifically when there are zero events, you have to modify things here.
Change the alert so that instead of alerting when the search gives zero rows, instead it'll alert "when a custom condition is met.
I would actually modify this search from
source=/var/log/messages ("shutdown" OR "reboot") NOT (date_hour=0 AND date_minute>14 AND date_minute<46) | stats count by host
to
source=/var/log/messages ("shutdown" OR "reboot") | eval is_expected_downtime=if(date_hour=0 AND date_minute>14 AND date_minute<46,"yes","no") | stats values(is_expected_downtime) as is_expected_downtime count by host
and then set as your custom condition to
| stats sum(count) as total values(is_expected_downtime) as is_expected_downtime | search total=0 is_expected_downtime="no"
If the alert fires in a condition where all the data is occurring during expected downtime, then the condition will not match. Outside of expected downtime, it will match whenever there are no raw events.
First of all, DO NOT USE the "free" date*
fields; create your own instead. See this Q&A for the reason:
https://answers.splunk.com/answers/243017/counting-the-total-number-of-days-for-all-time.html
A more general-purpose solution for this problem that could be used for both the OP's and your (opposite) situations, can be found here:
https://answers.splunk.com/answers/261163/is-there-a-way-i-can-schedule-a-saved-search-to-ru.html
Thanks for you suggestion @Woodcock
I tried to blend in your suggestion in my query but still unable to do it
Can you set it for me/if you have any other eg in mind
@woodcock Gentle reminder. It would be great if it is possible for you to assist me on this.
There is currently no way to configure a scheduled search not to run or generate alerts during a specific time frame.
If you're lucky, the blackout period you want to introduce is easy enough to bake into the cron format of the "cron_schedule".
This is however not the case with the example discussed here. In our case, we will have to play with the date_hour and date_minute internal fields to exclude events that would normally match the search.
We want only events that match the following criteria :
The following search string will achieve that goal :
source=/var/log/messages ("shutdown" OR "reboot") NOT (date_hour=0 AND date_minute>14 AND date_minute<46) | stats count by host
One caveat to this is that the date_* fields are not always present (for example with epoch timestamps or events where the _time is not present in the events themselves). Also, if I remember correctly, the date_ fields do not adjust for timezone differences.
quick clarification - in general the date_
fields aren't supposed to be generated when there's no clear TZ info for Splunk to rely on. So indeed without keeping a close eye on your sourcetypes you can't rely on them always being present. Then there is a bug that when time is pulled from an epochtime value (integer number of seconds since 1/1/1970) the date* fields will be present, and will be calculated in GMT.
Here, since the examples posted are all within a known sourcetype, it may very well be fine to use the date_ fields.