Alerting

Need help to understand this search and alert rule

jack1
Loves-to-Learn Everything

The obj is to only sends out alert if the  'low' and 'high' strings both detected more than 5 mins interval. Which means 5 min or less, the alert shld nt process or ignore it. More than 5 mins, process it and sends out alert if low or high received in the syslog.

Currently below was wht configured in the splunk rules for both low and high. But i dont really understand it. Can someone explain how it works?

Alert-Water High

index="watersb" item="Water Level" | fields watersb_timestamp host machine_id location state status | transaction host maxspan=5m | eval status_count=mvcount(status) | search status_count=1 status=high | eval timestamp=strptime(watersb_timestamp,"%b %d %H:%M:%S") | convert timeformat="%d %b %Y %H:%M:%S" ctime(timestamp) | table timestamp host status machine_id location state

 

Alert-Water Low

index="watersb" item="Water Level" | fields watersb_timestamp host machine_id location state status | transaction host maxspan=5m | eval status_count=mvcount(status) | search status_count=1 status=low | eval timestamp=strptime(watersb_timestamp,"%b %d %H:%M:%S") | convert timeformat="%d %b %Y %H:%M:%S" ctime(timestamp) | table timestamp host status machine_id location state

Labels (1)
0 Karma

jack1
Loves-to-Learn Everything

is the purpose to match ONLY the first received status(either high or low)? so tht it will only send 1 alert within the 5mins interval?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The transaction command collects events which match the criteria into "transaction events"; in your case, all events in a "transaction event" have the same host value and there is a maximum of 5 minutes between the first constituent event and the last constituent event. There may be multiple "transaction events" generated by the transaction command. Each "transaction event" will have a number of fields from the constituent events corresponding to the multiple unique values of the fields from the constituent events.

The purpose of these alerts is to check whether there has been only one status value in the constituent events for the "transaction event" and whether that value is high or low. So, no it is not just looking at the first event.

Why not pick a fixed time period and pull the events into a report. Then run the query used by the alert over the same time period and have a look at what goes into each transaction event. That way you will be able to see what it is doing.

0 Karma

jack1
Loves-to-Learn Everything

is that for the purpose of avoiding duplicates alerts due to duplicates status=x syslog?

so tht it only send one alerts in the 5mins window?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It is so that status_count is 1 if only one of the status values appears in the group of events in the transaction. Essentially, status_count becomes the number of unique values of status in the transaction.

0 Karma

jack1
Loves-to-Learn Everything

Below was the schedule every like 5 mins, i dont really understand wht is cron expression compared to time range . both here also 5 mins. Does this have to match 

transaction host maxspan=5m

 

jack1_0-1654348099082.png

 

0 Karma

jack1
Loves-to-Learn Everything

jack1_0-1654332610040.png

does this schedule has anything to do with the 5mins detection interval?

 

How about this? 

 transaction host maxspan=5m

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The schedule will align to 5 minutes, whereas the transaction maxspan will reset 5 minutes after the first event in the transaction, so with transaction 17:00 and 17:01 could be in different transactions whereas with the schedule they could be in the same transaction.

0 Karma

jack1
Loves-to-Learn Everything

status_count = number of different status here meaning status=low and status=high?

0 Karma

jack1
Loves-to-Learn Everything

also occasionally, splunk sends out alerts even though it received status=low @ 17:00:00 and status=high @ 17:03:00, but usually it wont. Any idea why?

 

0 Karma

jack1
Loves-to-Learn Everything

wht happen if 2 syslog contains status=low? status=low@17:00 and status=low@17:01, will it meet the condition? operational wise, it shld send out the alert but only one alert as there is not needed for duplicates.

 

0 Karma

jack1
Loves-to-Learn Everything

if 2 syslog contains status=low @ 17:00:00 and status=high @ 17:03:00, splunk dont need to process-consider false alarm since the interval is less than 5 mins

if syslog contains only status=low @18:00:00 but no status=high within the 5 mins interval, ok process it.

 if 2 syslog contains status=low @ 19:00:00 and status=high @ 20:07:00, splunk process both rules-since the interval is more than 5 mins

But i dont understand the logic configured here . Where is the rules tht said dont process if less than 5 mins and process only if more than 5 mins interval between  status=low and status=high? Which logic stated tht?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I think part of the issue here is understanding your data and what you want or expect to get out of it.

For example, if you have the following events

  • 17:00 - low
  • 17:02 - high
  • 17:08 - high
  • 17:09 - low

would you expect an alarm?

Within the two 5 minute periods 17:00-17:04 and 17:05-17:09 both periods have a low and a high, however there is a period in the middle of over 5 minutes between 2 highs. Transaction might have picked this up if the transaction boundary happens to start/end between 17:01 and 17:04, but a scheduled search would not have.

0 Karma

jack1
Loves-to-Learn Everything

There shld be no alarm for this due to 17:00-low and 17:02 high  is within the 5 mins window. assuming 5min timer start at 17:00

  • 17:00 - low
  • 17:02 - high
  •  
  • For this , no alarm as well. I assume 17:08 time start couting or how can we set tht?  meaning there shld be no other status frm 17:08 -17:13 (5mins), only then it will send alarm.In this case there is, so no alarm shld be sent.
  • 17:08 - high
  • 17:09 - low
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

If I understand correctly, you want to find when you have only one status in a 5 minute period? Could you not just remove the status filter from the search?

index="watersb" item="Water Level" | fields watersb_timestamp host machine_id location state status | transaction host maxspan=5m | eval status_count=mvcount(status) | search status_count=1 | eval timestamp=strptime(watersb_timestamp,"%b %d %H:%M:%S") | convert timeformat="%d %b %Y %H:%M:%S" ctime(timestamp) | table timestamp host status machine_id location state

 

0 Karma

jack1
Loves-to-Learn Everything

yes you are right, i want to find if only one status within the 5 mins. if gt 2 status, status=low and status=high, then consider as false alarm-dont want this kind of alert.

I am trying to understand how it was being configured and it was configured correctly.

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

These lines

| eval status_count=mvcount(status) | search status_count=1 

status_count becomes the number of different status values in the 5 minute period - where the count is 1 they are either all low or all high not both

0 Karma

jack1
Loves-to-Learn Everything

here 1 means only one syslog with status=high or low. How about duplicates like 2 syslog status=high@18:00 and status=high@18:01?

It is still consider as one or there is another line config to ignore/include duplicate within a time period?

 

search status_count=1

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Transaction only keeps unique values in the multivalue fields.

0 Karma

jack1
Loves-to-Learn Everything

sorry i dont quite get it. keeps unique values in the multivalue fields. means?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

With the transaction command, fields in the events are multi-valued with just the unique values not all values from all the events in the transaction

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...