We have an application platform which is gateway for multiple applications[1000+ apps], we have ingested platform logs in splunk.
Platform team wants to setup alerts for each of the applications if the number of HTTP 500 requests breaches the threshold say 5% of the total requests for that app. Alert schedule is every 5 mins. Alert should be sent to corresponding app team as there are more than 1000+ apps, platform team cannot handle.
I am using the top command with limit=0 , this is giving the total events across all the applications in last 5 mins but i want to be able treat the count of specific application as 100 and then calculate the 500 requests % and check if the threshold is violated.
my current query
index=abc | top HttpStatus app limit=0
This is how my current query data looks like , App1+App2 adds up to 100%
This is how i want the output to be able to trigger at app level. app1 breakup sums up to 100%,likewise App2 breakup sums up to 100%. Then i can link it to lookup and check if the 500 requests have exceeded the threshold % and then trigger the alert.
I want to use single consolidated alert which can trigger for each applications threshold violation[i plan to use lookup for thresholds] as we do not want to set up 1000 odd individual alerts specific to application
Can you share your thoughts if there is a way to achieve this or something similar.