We have an alert configured to send email when the number of results is >20 in 5min but since this is a timechart based search, Splunk is counting the time records as results instead of counting the actual events. Our requirement is to trigger email when the number of events is >20 in 5 min and include the timechart as well as the actual raw events in the email, is this possible ?
@ITWhisperer I was able to work around the query by adding the below
| stats count by host | eventstats sum(count) as Total | where Total>20
Then the trigger condition is when number of results greater than 0. This is alerting when there is atleast one host with count of of errors > 20.
Here is my query
index=main sourcetype=Logs ("The network path was not found" OR "Windows API Error 53" ) | stats count by host | eventstats sum(count) as Total
Now i give the custom condition for triggering alert as below when the total count of errors is greater than 20 but it doesn't fire an alert when both these are tried
search sum(count) > 20
search Total > 20
Could it be that there weren't enough events in the time period? What time periods have you configured? Is there a lag in indexing which means that the alert is scheduled to run before indexing for the time period has completed?
No, i have actually verified there are events for the time period i am searching in. There are 29 events and top of it i am trying to send an email when the count is greater than 20 using eventstats command.
But had those events been indexed at the time the report the alert was based on ran? What time parameters have you used for the alert? For example, if your alert was looking at the last 2 minutes and the indexing was running a little slow so that the events weren't indexed until 5 minutes after their timestamps (remembering that the timestamp of the event may have been derived from the data in the log, not the time the indexing actually ran), you may not have captured them when the alert report ran.
Yes, the events were indexed at the time the alert was ran. My timeframe for alert is week-to-date and the captured events were 1 day old so i can confirm it's not an indexing issue
It's a bit tricky to answer without seeing what you currently have. However, what about using eventstats to add the maximum number of events to all events and then triggering on this result (from the first row)
Run anywhere example
index=_internal sourcetype=splunkd log_level!=INFO component=*
| bin span=5m _time
| stats count by _time component
| eventstats max(count) as max