Hi, I am working on a requirement where I have write an alert based on the failure rate percentage of a service. Let's say I have 10 web services and I want to trigger the alert based on the traffic, successful and failed requests.
I have written a query but it doesn't seem to giving me the correct results -
index=myapp_prod sourcetype=myapp_service_log "System Exception" NOT "responseStatus=SUCCESS" NOT "ResponseStatusCode=404" NOT "Business Exception"
| stats count as Failures by serviceName
| appendcols
[ search index=myapp_prod sourcetype=myapp_service_log "responseStatus=SUCCESS"
| stats count as Success by serviceName
| fillnull]
| eval Total = Success + Failures
<!-- when failureRatePercentage > 10 -->
What i want is table like this. With my query, it's not coming right. I am getting multiple rows with same serviceName and empty column
serviceName | TotalRequest | Success | Failed | FailureRatePercentage |
service1 | 1000 | 800 | 200 | 20 |
service2 | 2000 | 1500 | 500 | 25 |
Can anyone advice how can i achieve this? It's better to set the alert based on failure percentage rather than the absolute value
hi @shashank_24 ,
Try this:
index=myapp_prod sourcetype=myapp_service_log "System Exception" NOT "responseStatus=SUCCESS" NOT "ResponseStatusCode=404" NOT "Business Exception"
| stats count as Failures by serviceName
| appendcols
[ search index=myapp_prod sourcetype=myapp_service_log "responseStatus=SUCCESS"
| stats count as Success by serviceName
| fillnull]
| stats max(*) as * by serviceName
| eval Total = Success + Failures, FailureRatePercentage=(Failures*100)/Total
If this reply helps you, a like would be appreciated.
Hi @manjunathmeti , Thanks for the response. I have already tried it actually and what i get is a weird table like this - rows up and down. Not sure why it is coming like this. Is it because I am using BY clause?
serviceName | failedRatePerc | Failures | Success | Total |
service1 | 0.796812749003984 | 2 | 249 | 251 |
service2 | 0.9779951100244498 | 4 | 405 | 409 |
service3 | 99.95985547972701 | 2490 | 1 | 2491 |
service4 | 3032 | |||
service5 | 2222 |
If you see above table the data is not aligned and some of the columns are empty. It looks like it's appending the rows. I need this to be aligned and should show the data for the correct service.
It means that you don't have any failures for the last 2 serviceName values. Use fillnull command after stats command to fill null values.
index=myapp_prod sourcetype=myapp_service_log "System Exception" NOT "responseStatus=SUCCESS" NOT "ResponseStatusCode=404" NOT "Business Exception"
| stats count as Failures by serviceName
| appendcols
[ search index=myapp_prod sourcetype=myapp_service_log "responseStatus=SUCCESS"
| stats count as Success by serviceName]
| stats max(*) as * by serviceName
| fillnull
| eval Total = Success + Failures, FailureRatePercentage=(Failures*100)/Total
@manjunathmeti No it's not giving the correct result. I think the appendcols is not working here. If you see the image i attached (Sorry I had to mask the service name due to security reasons), the 3rd service shows the failure rate as Zero where as that service has more than 50% failure rate in last 24 hours.
Also if i run the sub search which is counting the exceptions I get more than 2k result. So something is not right
I didn't see you are using appendcols. You need to use append.
index=myapp_prod sourcetype=myapp_service_log "System Exception" NOT "responseStatus=SUCCESS" NOT "ResponseStatusCode=404" NOT "Business Exception"
| stats count as Failures by serviceName
| append
[ search index=myapp_prod sourcetype=myapp_service_log "responseStatus=SUCCESS"
| stats count as Success by serviceName]
| stats max(*) as * by serviceName
| fillnull
| eval Total = Success + Failures, FailureRatePercentage=(Failures*100)/Total