Is there a way to create a detector to alert if a particular user (based on a part of the URL) is experiencing a higher number of errors?
For example, if I have a /user/{customerId}/do-something URL, then I want to be alerted when a particular {customerId} has a high number of errors within a specific time period. If there's a higher number of errors but they're mostly for different {customerId} values, then I don't want a notification.
Thanks.
You could filter for the errors, extract the customerid and count by customerid. Then determine the percentage of all the errors each customerid has and then alert if this percentage is greater than a nominal value.
You make it sound so easy, but I should say that I'm a Splunk Observability newbie. If I add an APM Detector it doesn't give me many avenues to customise it, and if I create a Custom Detector I seem to be in the area where newbies shouldn't be.
However, I tried adding "errors_sudden_static_v2" for the "A" signal, and besides which is an Add Filter button. Is this where I need to "filter for the errors, extract the customerid and count by customerid"?
My use case sounds like it should be a fairly common one, so is there an explanatory guide somewhere on doing things like this? I haven't found one yet.
If I show the SignalFlow for my APM Detector, this is what it looks like:
from signalfx.detectors.apm.errors.static_v2 import static as errors_sudden_static_v2
errors_sudden_static_v2.detector(
attempt_threshold=1,
clear_rate_threshold=0.01,
current_window='5m',
filter_=(
filter('sf_environment', 'prod')
and (
filter('sf_service', 'my-service-name')
and filter('sf_operation', 'POST /api/{userId}/endpointPath')
)
),
fire_rate_threshold=0.02,
resource_type='service_operation'
)
.publish('TeamPrefix my-service-name /endpointPath errors')
The {userId} in the sf_operation is what I want to group the results on and only alert if a particular userId is generating a high number of errors compared to everybody else.
Thank you.
I managed to achieve the same outcome with an alert in Splunk Cloud like this:
index=my_idx path="/api/*/endpointPath" status=500
| rex field=path "/api/(?<userId>.*)/endpointPath"
| fields userId
| stats count by userId
| eventstats sum(count) as totalCount
| eval percentage=(count/totalCount)
| where percentage>0.05
| sort -count