The obj is to only sends out alert if the 'low' and 'high' strings both detected more than 5 mins interval. Which means 5 min or less, the alert shld nt process or ignore it. More than 5 mins, process it and sends out alert if low or high received in the syslog.
Currently below was wht configured in the splunk rules for both low and high. But i dont really understand it. Can someone explain how it works?
Alert-Water High
index="watersb" item="Water Level" | fields watersb_timestamp host machine_id location state status | transaction host maxspan=5m | eval status_count=mvcount(status) | search status_count=1 status=high | eval timestamp=strptime(watersb_timestamp,"%b %d %H:%M:%S") | convert timeformat="%d %b %Y %H:%M:%S" ctime(timestamp) | table timestamp host status machine_id location state
Alert-Water Low
index="watersb" item="Water Level" | fields watersb_timestamp host machine_id location state status | transaction host maxspan=5m | eval status_count=mvcount(status) | search status_count=1 status=low | eval timestamp=strptime(watersb_timestamp,"%b %d %H:%M:%S") | convert timeformat="%d %b %Y %H:%M:%S" ctime(timestamp) | table timestamp host status machine_id location state
is the purpose to match ONLY the first received status(either high or low)? so tht it will only send 1 alert within the 5mins interval?
The transaction command collects events which match the criteria into "transaction events"; in your case, all events in a "transaction event" have the same host value and there is a maximum of 5 minutes between the first constituent event and the last constituent event. There may be multiple "transaction events" generated by the transaction command. Each "transaction event" will have a number of fields from the constituent events corresponding to the multiple unique values of the fields from the constituent events.
The purpose of these alerts is to check whether there has been only one status value in the constituent events for the "transaction event" and whether that value is high or low. So, no it is not just looking at the first event.
Why not pick a fixed time period and pull the events into a report. Then run the query used by the alert over the same time period and have a look at what goes into each transaction event. That way you will be able to see what it is doing.
is that for the purpose of avoiding duplicates alerts due to duplicates status=x syslog?
so tht it only send one alerts in the 5mins window?
It is so that status_count is 1 if only one of the status values appears in the group of events in the transaction. Essentially, status_count becomes the number of unique values of status in the transaction.
Below was the schedule every like 5 mins, i dont really understand wht is cron expression compared to time range . both here also 5 mins. Does this have to match
transaction host maxspan=5m
does this schedule has anything to do with the 5mins detection interval?
How about this?
transaction host maxspan=5m
The schedule will align to 5 minutes, whereas the transaction maxspan will reset 5 minutes after the first event in the transaction, so with transaction 17:00 and 17:01 could be in different transactions whereas with the schedule they could be in the same transaction.
status_count = number of different status here meaning status=low and status=high?
also occasionally, splunk sends out alerts even though it received status=low @ 17:00:00 and status=high @ 17:03:00, but usually it wont. Any idea why?
wht happen if 2 syslog contains status=low? status=low@17:00 and status=low@17:01, will it meet the condition? operational wise, it shld send out the alert but only one alert as there is not needed for duplicates.
if 2 syslog contains status=low @ 17:00:00 and status=high @ 17:03:00, splunk dont need to process-consider false alarm since the interval is less than 5 mins
if syslog contains only status=low @18:00:00 but no status=high within the 5 mins interval, ok process it.
if 2 syslog contains status=low @ 19:00:00 and status=high @ 20:07:00, splunk process both rules-since the interval is more than 5 mins
But i dont understand the logic configured here . Where is the rules tht said dont process if less than 5 mins and process only if more than 5 mins interval between status=low and status=high? Which logic stated tht?
I think part of the issue here is understanding your data and what you want or expect to get out of it.
For example, if you have the following events
would you expect an alarm?
Within the two 5 minute periods 17:00-17:04 and 17:05-17:09 both periods have a low and a high, however there is a period in the middle of over 5 minutes between 2 highs. Transaction might have picked this up if the transaction boundary happens to start/end between 17:01 and 17:04, but a scheduled search would not have.
There shld be no alarm for this due to 17:00-low and 17:02 high is within the 5 mins window. assuming 5min timer start at 17:00
If I understand correctly, you want to find when you have only one status in a 5 minute period? Could you not just remove the status filter from the search?
index="watersb" item="Water Level" | fields watersb_timestamp host machine_id location state status | transaction host maxspan=5m | eval status_count=mvcount(status) | search status_count=1 | eval timestamp=strptime(watersb_timestamp,"%b %d %H:%M:%S") | convert timeformat="%d %b %Y %H:%M:%S" ctime(timestamp) | table timestamp host status machine_id location state
yes you are right, i want to find if only one status within the 5 mins. if gt 2 status, status=low and status=high, then consider as false alarm-dont want this kind of alert.
I am trying to understand how it was being configured and it was configured correctly.
These lines
| eval status_count=mvcount(status) | search status_count=1
status_count becomes the number of different status values in the 5 minute period - where the count is 1 they are either all low or all high not both
here 1 means only one syslog with status=high or low. How about duplicates like 2 syslog status=high@18:00 and status=high@18:01?
It is still consider as one or there is another line config to ignore/include duplicate within a time period?
search status_count=1
Transaction only keeps unique values in the multivalue fields.
sorry i dont quite get it. keeps unique values in the multivalue fields. means?
With the transaction command, fields in the events are multi-valued with just the unique values not all values from all the events in the transaction