Splunk Search

Search help to identify when start/finish tasks fail to finish properly.

zindain24
Path Finder

Need some advice on a search. I have a logfile that clearly states starting and finishing tasks for each of the Batch Process jobs that run. There are ~70 different batch process jobs which are clearly displayed in each event *(bold in my examples below) * that need to be checked to ensure they start and finish properly.

2016/04/13 16:52:44.740 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
2016/04/13 16:55:42.539 INFO s-------- Batch Process 'SchedulBillingPayment_ThreadedTrigger' finished

Does anyone have an idea of how i can detect when any job doesn't successfully start and finish within a rolling 7 minute window?

I was thinking about using the |transaction command and searching the field_match_sum field, but I'm not sure it will be reliable enough.

Here is what I am working with, but may be way off:

sourcetype=batchprocesses Batch Process starting OR finished | rex "Batch\sProcess\s\'(?\w+)\'\s(?\w+)" | transaction Trigger startswith=starting endswith=finished  | table  Trigger, Status, duration

Thoughts? Suggestions? Thanks! Jeremy

0 Karma
1 Solution

Runals
Motivator

I would bake field extraction into the configs but as is I'd try something like

sourcetype=batchprocess starting OR finished | rex "Batch Process \'(?<batchProcess>[^\_]+)" | stats count max(_time) as finish min(_time) as start by batchProcess | eval duration = finish - start | where count = 2 AND (duration > 430 AND duration < 400)

You might have to monkey with my where statement as I might have the logic wrong or you have different tolerances =). You mention a rolling 7 minute window so I might schedule that to run every 9 min +/- over maybe a slightly longer time span. The idea is to simulate rolling 7 minute window while potentially baking in some time for the events to be processed (you might not need that based on your system performance). Anyway that would be an option besides using transaction.

View solution in original post

0 Karma

Runals
Motivator

I would bake field extraction into the configs but as is I'd try something like

sourcetype=batchprocess starting OR finished | rex "Batch Process \'(?<batchProcess>[^\_]+)" | stats count max(_time) as finish min(_time) as start by batchProcess | eval duration = finish - start | where count = 2 AND (duration > 430 AND duration < 400)

You might have to monkey with my where statement as I might have the logic wrong or you have different tolerances =). You mention a rolling 7 minute window so I might schedule that to run every 9 min +/- over maybe a slightly longer time span. The idea is to simulate rolling 7 minute window while potentially baking in some time for the events to be processed (you might not need that based on your system performance). Anyway that would be an option besides using transaction.

View solution in original post

0 Karma

zindain24
Path Finder

I'll clarify a bit. During all normal conditions jobs will finish within 5 minutes of their start time. I've already extended the search to +7 minutes to compensate. My task is to have Splunk verify a starting and finished event both occur during the 7 minute window for each of the batch processes.

For example:

Good: (starting and finished within 7minute window)
2016/04/13 16:52:44.740 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
2016/04/13 16:55:42.539 INFO s-------- Batch Process 'SchedulBillingPayment_ThreadedTrigger' finished

Bad: (two starting events, no finished event)
2016/04/13 16:50:00.000 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting
2016/04/13 16:57:00.000 INFO s-------- Batch Process 'ScheduleBillingPayment_ThreadedTrigger' starting

Thanks for your help!

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!