Need some advice on a search. I have a logfile that clearly states starting and finishing tasks for each of the Batch Process jobs that run. There are ~70 different batch process jobs which are clearly displayed in each event *(bold in my examples below) * that need to be checked to ensure they start and finish properly.
2016/04/13 16:52:44.740 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
2016/04/13 16:55:42.539 INFO s-------- Batch Process 'SchedulBillingPayment_ThreadedTrigger' finished
Does anyone have an idea of how i can detect when any job doesn't successfully start and finish within a rolling 7 minute window?
I was thinking about using the |transaction
command and searching the field_match_sum field, but I'm not sure it will be reliable enough.
Here is what I am working with, but may be way off:
sourcetype=batchprocesses Batch Process starting OR finished | rex "Batch\sProcess\s\'(?\w+)\'\s(?\w+)" | transaction Trigger startswith=starting endswith=finished | table Trigger, Status, duration
Thoughts? Suggestions? Thanks! Jeremy
I would bake field extraction into the configs but as is I'd try something like
sourcetype=batchprocess starting OR finished | rex "Batch Process \'(?<batchProcess>[^\_]+)" | stats count max(_time) as finish min(_time) as start by batchProcess | eval duration = finish - start | where count = 2 AND (duration > 430 AND duration < 400)
You might have to monkey with my where statement as I might have the logic wrong or you have different tolerances =). You mention a rolling 7 minute window so I might schedule that to run every 9 min +/- over maybe a slightly longer time span. The idea is to simulate rolling 7 minute window while potentially baking in some time for the events to be processed (you might not need that based on your system performance). Anyway that would be an option besides using transaction.
I would bake field extraction into the configs but as is I'd try something like
sourcetype=batchprocess starting OR finished | rex "Batch Process \'(?<batchProcess>[^\_]+)" | stats count max(_time) as finish min(_time) as start by batchProcess | eval duration = finish - start | where count = 2 AND (duration > 430 AND duration < 400)
You might have to monkey with my where statement as I might have the logic wrong or you have different tolerances =). You mention a rolling 7 minute window so I might schedule that to run every 9 min +/- over maybe a slightly longer time span. The idea is to simulate rolling 7 minute window while potentially baking in some time for the events to be processed (you might not need that based on your system performance). Anyway that would be an option besides using transaction.
I'll clarify a bit. During all normal conditions jobs will finish within 5 minutes of their start time. I've already extended the search to +7 minutes to compensate. My task is to have Splunk verify a starting and finished event both occur during the 7 minute window for each of the batch processes.
For example:
Good: (starting and finished within 7minute window)
2016/04/13 16:52:44.740 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
2016/04/13 16:55:42.539 INFO s-------- Batch Process 'SchedulBillingPayment_ThreadedTrigger' finished
Bad: (two starting events, no finished event)
2016/04/13 16:50:00.000 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
2016/04/13 16:57:00.000 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
Thanks for your help!