Dear experts,
I've created an alert based on a message string to identify closed connections . However, alert gets triggered only once although the problem doesn't get fixed until we bounce.
Looking for a query to have an recurring alert, until I see success message string as "*reconfigured with 'RabbitMQ' bean*" as the latest in comparison to the failed strings across all events.
Failed messages: *com.rabbitmq.client.ShutdownSignalException* OR "*"channel shutdown*"
Success message: "*reconfigured with 'RabbitMQ' bean*"
Current Alert query that occurs only once:
index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*") |stats count by cf_app_name, cf_foundation
Thank you for the help
That requirement can be included in the search.
index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*")
| dedup <<field with message>>
| where NOT match(<<field with message>>, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation
Alerts are triggered each time the search criteria are met (unless throttled). If the shutdown event is only received once then the alert will only be triggered once. If you want the alert to repeat then the search must be written and scheduled to find the triggering event (or canceling event) each time it runs.
Thank you Rich - however, I don't want to create noise of recurring alert unless there is a need. i.e., only if the reconfigured message is not the latest in comparison to other strings - i want alerts to recur
That requirement can be included in the search.
index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*")
| dedup <<field with message>>
| where NOT match(<<field with message>>, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation
Thank you once again Rich.
To add more details:
Failed condition comes as a different fields in the event compared to reconfigured which comes at a different position of the event. In short, if i extract this is how it would look
msg1 = "channel shutdown"
msg2 = "com.rabbitmq.client.ShutdownSignalException"
msg3 ="*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*"
msg4= "*reconfigured with 'RabbitMQ' bean*"
Alert should be kept triggering until msg 4 is latest in comparison to all other 3 messages irrespective of even it occurring only once.
If the failure and success messages are in different fields, then we can use the coalesce function to combine them for dedup.
index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*")
| eval alert_field = coalesce(<<msg1 field>>, <<msg2 field>>, <<msg3 field>>, <<msg4 field>>)
| dedup alert_field
| where NOT match(alert_field, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation
Thanks Rich. However, the challenge being alert is set to run for every 15 mins and events occur only once. How can it recur every 15 mins since the failure event won't occur?
Thank you for patience
Yes, that's the tricky bit and goes back to the part of my first reply that said "the search must be written and scheduled to find the triggering event". Rather than search back 15 minutes, it will be necessary for the alert to search back as far as necessary to find the events of interest.
That's right - however, if i use dedup, and if the failed message has occurred after the success message, would that removes duplicates for failed messages
Only bit I don't get is, how do i compare timestamp for msg 4 (success) to be latest among all messages?
There's no need to compare timestamps. The dedup command keeps the most recent event so whatever result you get must be the latest message.