I want to effectively monitor a system with 100+ URI. So far, approach was to monitor server error by tracking 500 status codes or 5XX status codes (https status codes along with URI gets printed in splunk logs nicely!)
Now in recent past, saw problem related with -
1. issue came up with 401 or 403 status codes, it may seem too easy just to add 4xx status codes to be monitored but with so many URI it's tedious. 2. Some URI, traffic even didn't generated, so no question of it coming up in 5XX monitoring as traffic itself was 0. I know a possible solution is to use a lookup file and fillnull URI but in my approach of 5XX monitoring , I didn't use lookup file. I'm doing a blanket search in all logs and then doing stats by URI and throwing alert if 5XX count percentage is more than 20%. The reason for not using URI lookup file is these URIs keep changing every week and I wanted a robust solution which would work without manual update.
So please suggest a way to effectively monitor this situation. I wanted to know if there are any specific command(like anomaly or something similar) that I can look deep into, which might help.