Uptime Alerting Over a period of time

julianpmcaf · ‎11-24-2023

Hi,

Currently I have a browser test set up and I would like it so that if the uptime falls below 98% I want an email sent out to certain people. However, now the alerting with uptime works on an per test basis, in that the uptime is either 100 or 0 if a test fails. How can I set it so that the uptime views the uptime throughout a period of time and not per test? The Image below might show better what I mean with what I currently have.

bishida · ‎12-03-2023

Hi,

You may want to try creating a custom detector. Can you try this and see if it's what you want?

Go to Alerts & Detectors section.
Click "New Detector" and select "Custom Detector".
Give the new detector a name and click "create alert rule".
For your alert signal, choose "synthetics.run.uptime.percent"
Add a filter for this signal "test_id" and specify the ID of your test (tip: the ID of your test is visible in the URL when you are viewing it)
Click on "add analytics" and select "mean" and "mean aggregation".
Click anywhere outside of the box to clear out the "group by" box.
In the time window, choose something appropriate such as "-1h" for "past 1 hour"
Proceed to "alert condition" and select "static threshold".
Proceed to "alert settings" and choose "below" and "98".
The UI will so some historical analysis and show you how many times the alert would have fired over your selected time period.
From here you can proceed by customizing your alert message and alert recipients and activating the alert if everything looks good.

ITWhisperer · ‎11-24-2023

You should extend the search to cover a period of time and count how many times the test succeeds and and how many times it fails. From this, you can work out a percentage success rate. Use this in your alert.

julianpmcaf · ‎11-24-2023

How should I try to count the number of successes and failure? With the Alert rules I can't seem to find a way to be able to count the number of successes and failures in my browser test? Maybe im missing something

ITWhisperer · ‎11-24-2023

What search are the alerts using? What events do you have already ingested into Splunk?

PickleRick · ‎11-27-2023

@ITWhisperer I know that there isn't much traffic in this part of Answers but it's a question about Observability Cloud, not Splunk Enterprise/Cloud. (I don't know the right answer myself. Just pointing out so that you don't drift too far wrong way ;-)).

julianpmcaf · ‎11-24-2023

I'm not sure about the alerts. For the events I have an uptime event set up for my browser test and that's all. Im sorry if I'm not making much sense also, I'm new to splunk.

ITWhisperer · ‎11-24-2023

OK It sounds like you have a browser that is periodically testing and this is what you want to alert on?

You probably need to convert this to an automated test which executes somewhere and produces a log or successes and failures, and then ingest this log into Splunk so you can monitor it and raise alerts if the success rate falls below 98%.

julianpmcaf · ‎11-27-2023

Sorry for the late reply, Is there a way to have this automated test done via splunk observability and produce a log then as well that would be ingested into splunk or do I need to find an outside product that would perform my tests and produce a log for me of the successes and failures.

Uptime Alerting Over a period of time

Splunk Observability Cloud

Splunk Real User Monitoring (RUM)

Splunk Synthetic Monitoring

troubleshooting

Let’s Talk Terraform

Cloud Platform | Customer Change Announcement: Email Notification is Available For ...

Save the Date: GovSummit Returns Wednesday, December 11th!