Splunk Observability Cloud

Create detector and group by path variable

mfearby
Observer

Is there a way to create a detector to alert if a particular user (based on a part of the URL) is experiencing a higher number of errors?

For example, if I have a /user/{customerId}/do-something URL, then I want to be alerted when a particular {customerId} has a high number of errors within a specific time period. If there's a higher number of errors but they're mostly for different {customerId} values, then I don't want a notification.

Thanks.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could filter for the errors, extract the customerid and count by customerid. Then determine the percentage of all the errors each customerid has and then alert if this percentage is greater than a nominal value.

0 Karma

mfearby
Observer

You make it sound so easy, but I should say that I'm a Splunk Observability newbie. If I add an APM Detector it doesn't give me many avenues to customise it, and if I create a Custom Detector I seem to be in the area where newbies shouldn't be.

However, I tried adding "errors_sudden_static_v2" for the "A" signal, and besides which is an Add Filter button. Is this where I need to "filter for the errors, extract the customerid and count by customerid"?

My use case sounds like it should be a fairly common one, so is there an explanatory guide somewhere on doing things like this? I haven't found one yet.

If I show the SignalFlow for my APM Detector, this is what it looks like:

 

from signalfx.detectors.apm.errors.static_v2 import static as errors_sudden_static_v2
errors_sudden_static_v2.detector(
	attempt_threshold=1, 
	clear_rate_threshold=0.01, 
	current_window='5m', 
	filter_=(
		filter('sf_environment', 'prod') 
		and (
			filter('sf_service', 'my-service-name') 
			and filter('sf_operation', 'POST /api/{userId}/endpointPath')
		)
	), 
	fire_rate_threshold=0.02, 
	resource_type='service_operation'
)
.publish('TeamPrefix my-service-name /endpointPath errors')

 

The {userId} in the sf_operation is what I want to group the results on and only alert if a particular userId is generating a high number of errors compared to everybody else.

Thank you.

0 Karma

mfearby
Observer

I managed to achieve the same outcome with an alert in Splunk Cloud like this:

index=my_idx path="/api/*/endpointPath" status=500 
| rex field=path "/api/(?<userId>.*)/endpointPath" 
| fields userId 
| stats count by userId 
| eventstats sum(count) as totalCount
| eval percentage=(count/totalCount)
| where percentage>0.05
| sort -count
0 Karma
Get Updates on the Splunk Community!

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Your Next Big Security Credential: No Prerequisites Needed We know you’ve got the skills, and now, earning the ...

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

This is the sixth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Answers Content Calendar, July Edition I

Hello Community! Welcome to another month of Community Content Calendar series! For the month of July, we will ...