enter code hereHi Team,
We have an application platform which is gateway for multiple applications[1000+ apps], we have ingested platform logs in splunk.
Platform team wants to setup alerts for each of the applications if the number of HTTP 500 requests breaches the threshold say 5% of the total requests for that app. Alert schedule is every 5 mins. Alert should be sent to corresponding app team as there are more than 1000+ apps, platform team cannot handle.
I am using the top command with limit=0 , this is giving the total events across all the applications in last 5 mins but i want to be able treat the count of specific application as 100 and then calculate the 500 requests % and check if the threshold is violated.
my current query
index=abc | top HttpStatus app limit=0
This is how my current query data looks like , App1+App2 adds up to 100%
HttpStatus app count percent 400 App1 30 0.091609 200 App1 15 0.045804 500 App1 6 0.018322 200 App2 3813 11.643459 400 App2 2 0.006107 500 App2 28882 88.194699
This is how i want the output to be able to trigger at app level. app1 breakup sums up to 100%,likewise App2 breakup sums up to 100%. Then i can link it to lookup and check if the 500 requests have exceeded the threshold % and then trigger the alert.
HttpStatus app count percent 400 App1 30 58.82352941 200 App1 15 29.41176471 500 App1 6 11.76470588 200 App2 3813 11.66162033 400 App2 2 0.006116769 500 App2 28882 88.3322629
I want to use single consolidated alert which can trigger for each applications threshold violation[i plan to use lookup for thresholds] as we do not want to set up 1000 odd individual alerts specific to application
Can you share your thoughts if there is a way to achieve this or something similar.