I have an alert set up in my Splunk Enterprise Security environment that is set to trigger when we receive a notable that is marked as either high or critical urgency. This search has worked in the past but did not trigger over the weekend even though a high urgency notable was created.
When I run the search manually over the time range, I can see that it returns results, so this does not appear to be an issue with the search logic. There is also no throttling configured. The search is on a cron schedule to run every 2 minutes and looks over the last 125 seconds of events.
Looking through the audit logs, I can see the following events to confirm that the saved search completed running:
Audit:[timestamp=09-16-2018 13:50:24.206, user=XXXX, action=search, info=completed, search_id='scheduler__admin_YWFtX2FsbF9jdXN0b20tc2VhcmNoZXM__RMD571711d067188a19b_at_1537102200_43245_2C5B5B40-2FE3-4255-A621-60A171C4E9C0', total_run_time=2.63, event_count=0, result_count=0, available_count=0, scan_count=0, drop_count=0, exec_time=1537102203, api_et=1537102075.000000000, api_lt=1537102200.000000000, search_et=1537102075.000000000, search_lt=1537102200.000000000, is_realtime=0, savedsearch_name="New Notable - Alert", search_startup_time="2235", searched_buckets=11, eliminated_buckets=0, considered_events=0, total_slices=0, decompressed_slices=0][n/a]
Audit:[timestamp=09-16-2018 13:50:34.832, user=XXXX, action=search, info=completed, search_id='rsa_scheduler__admin_YWFtX2FsbF9jdXN0b20tc2VhcmNoZXM__RMD571711d067188a19b_at_1537102200_43245_2C5B5B40-2FE3-4255-A621-60A171C4E9C0', total_run_time=2.63, event_count=0, result_count=0, available_count=0, scan_count=0, drop_count=0, exec_time=1537102203, api_et=1537102075.000000000, api_lt=1537102200.000000000, search_et=1537102075.000000000, search_lt=1537102200.000000000, is_realtime=0, savedsearch_name="New Notable - Alert", search_startup_time="2235", searched_buckets=11, eliminated_buckets=0, considered_events=0, total_slices=0, decompressed_slices=0][n/a]
The event that should have triggered an alert occurred at 13:50:31PM on 09-16-18. The second event in the audit log shows that the search was completed three seconds later at 13:50:34PM. Is it possible that these overlapped which prevented the alert from being triggered? The next scheduled search was run at 1:52:24PM so this should have triggered an alert as well.
Typically when I see this question, it is an issue of indexing latency/lag.
The search is on a cron schedule to run every 2 minutes and looks over the last 125 seconds of events.
If the event wasn't indexed until it was 200 seconds old, for example, it won't be picked up by the search. We typically recommend adding in a buffer to account for index latency/lag. You could use this eval to determine the time difference between index time and event time:
|eval indextime=_indextime |eval bkt=_bkt |eval delay_sec=_indextime-_time |table indextime _time delay_sec bkt | rename _time as Eventtime | convert ctime(*time)
If you happen to find the latency for an event is 2 1/2 minutes, for example, you may want to adjust the search to look back 3 to 6 minutes:
This takes into account any latency between index time and event time. Basically allowing enough time to pass so that when the search is run you know the data will be there.
Typically when I see this question, it is an issue of indexing latency/lag.
The search is on a cron schedule to run every 2 minutes and looks over the last 125 seconds of events.
If the event wasn't indexed until it was 200 seconds old, for example, it won't be picked up by the search. We typically recommend adding in a buffer to account for index latency/lag. You could use this eval to determine the time difference between index time and event time:
|eval indextime=_indextime |eval bkt=_bkt |eval delay_sec=_indextime-_time |table indextime _time delay_sec bkt | rename _time as Eventtime | convert ctime(*time)
If you happen to find the latency for an event is 2 1/2 minutes, for example, you may want to adjust the search to look back 3 to 6 minutes:
This takes into account any latency between index time and event time. Basically allowing enough time to pass so that when the search is run you know the data will be there.
Firstly, have you checked under activity > triggered alerts to ascertain, if the alert actually got triggered?
Next, your cron expression , can you just for the time being modify it to */5 * * * * and check if it gets triggered every 5 minutes?
I can confirm the alert was not run. It was not in triggered alerts and I can see that no results were found in the internal logs when the saved search was run.
I cannot change the cron expression as this is a production system, but this alert has worked in the past. No changes have been made which would have effected this.
Hmmm interesting. Now, i had a deeper look into your audit logs. I can see (in both your audit logs) stuff like this - event_count=0, result_count=0, available_count=0, scan_count=0, drop_count=0
Ref. this
http://docs.splunk.com/Documentation/Splunk/7.0.2/Search/ViewsearchjobpropertieswiththeJobInspector
and this
https://answers.splunk.com/answers/615059/what-is-the-meaning-of-the-splunk-auditlog-fields.html
Basically for some reason there is no event, result set being found by your alert and thus it is not getting triggered. As such I do not think that this is a issue that belongs to the alert configuration umbrella. Without looking at your search and the alert set up it is difficult for me to say what's going wrong, but suffice it to say that your alerts not getting triggered is not a result of alert config. rather and issue wherein the alert search is not finding any event/result set, based on which to trigger the alert
does "the user" who runs the schedules have access to the data?
Are you sure the second result is runnin gover the right period?
Looks like you search was running over the period before you alert.
search_et=1537102075.00000 GMT: Sunday 16 September 2018 12:47:55
search_lt=1537102075.000000000 = GMT: Sunday 16 September 2018 12:50:00