Splunk Observability Cloud

Create detector and group by path variable

mfearby
Observer

Is there a way to create a detector to alert if a particular user (based on a part of the URL) is experiencing a higher number of errors?

For example, if I have a /user/{customerId}/do-something URL, then I want to be alerted when a particular {customerId} has a high number of errors within a specific time period. If there's a higher number of errors but they're mostly for different {customerId} values, then I don't want a notification.

Thanks.

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could filter for the errors, extract the customerid and count by customerid. Then determine the percentage of all the errors each customerid has and then alert if this percentage is greater than a nominal value.

0 Karma

mfearby
Observer

You make it sound so easy, but I should say that I'm a Splunk Observability newbie. If I add an APM Detector it doesn't give me many avenues to customise it, and if I create a Custom Detector I seem to be in the area where newbies shouldn't be.

However, I tried adding "errors_sudden_static_v2" for the "A" signal, and besides which is an Add Filter button. Is this where I need to "filter for the errors, extract the customerid and count by customerid"?

My use case sounds like it should be a fairly common one, so is there an explanatory guide somewhere on doing things like this? I haven't found one yet.

If I show the SignalFlow for my APM Detector, this is what it looks like:

 

from signalfx.detectors.apm.errors.static_v2 import static as errors_sudden_static_v2
errors_sudden_static_v2.detector(
	attempt_threshold=1, 
	clear_rate_threshold=0.01, 
	current_window='5m', 
	filter_=(
		filter('sf_environment', 'prod') 
		and (
			filter('sf_service', 'my-service-name') 
			and filter('sf_operation', 'POST /api/{userId}/endpointPath')
		)
	), 
	fire_rate_threshold=0.02, 
	resource_type='service_operation'
)
.publish('TeamPrefix my-service-name /endpointPath errors')

 

The {userId} in the sf_operation is what I want to group the results on and only alert if a particular userId is generating a high number of errors compared to everybody else.

Thank you.

0 Karma

mfearby
Observer

I managed to achieve the same outcome with an alert in Splunk Cloud like this:

index=my_idx path="/api/*/endpointPath" status=500 
| rex field=path "/api/(?<userId>.*)/endpointPath" 
| fields userId 
| stats count by userId 
| eventstats sum(count) as totalCount
| eval percentage=(count/totalCount)
| where percentage>0.05
| sort -count
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...