Splunk Search
Highlighted

Report to show missed schedule searches duo downtime

Communicator

Hello,

i have some scheduled searches. Some run every 5 minutes, some 15 minutes some hourly etc.

Some of those searches are there to generate a summary index, a few other to exportcsv to feed it into another tool regularly.

if there is an outage of the search head (even with two search heads or SH pooling) some jobs might be skipped or missed as they won't rerun. This will result in a not complete dataset.

In the first step, i'll would need a Report which shows me, based on the actual schedule which search was skipped.

index=_internal source=*scheduler.log | eval sched = strftime(scheduled_time, "%Y-%m-%d %H:%M:%S") | search savedsearch_name="Project Honey Pot - Threatscore AVG Last 4 Hours" NOT continued | table sched status savedsearch_name

alt text

Report Like:

User Activity Search
- Last RUN | scheduled every 5 Minutes | STATUS=Completed
- Last RUN - 5 minutes | STATUS=Completed
- Last RUN -10 Minutes | STATUS=Completed
- Last RUN -15 Minutes | Status=Not Executed
- Last RUN -20 Minutes | Status-Completed

and this for each scheduled search dynamically with the scheduled every 5 minutes.

The above example would show a potential Restart of the SH 15 minutes ago. And then i can manually investigate and re-run the export for this specific timeframe to add the data again... or it can review the last 10 successfull runs - subtract the times and then automatically detect that it is running all 5 minutes.

Thanks a lot
Matthias

Highlighted

Re: Report to show missed schedule searches duo downtime

Motivator

Take a look at the SoS app under Search | Scheduler Activity.

0 Karma
Highlighted

Re: Report to show missed schedule searches duo downtime

Communicator

thanks for your hint - but this shows only successfull run vs. errors. if i have a scheduled activity every 5 minutes and i shutdown my instance of splunk for 1 hour and start it again - i won't get it displyed that it misses several scheduled activities...

0 Karma
Highlighted

Re: Report to show missed schedule searches duo downtime

Motivator

Correct, and that's because the scheduler does not keep state across restarts and naturally does not log anything. I suppose that you can modify your search to include a condition that checks whether a shutdown event occurred in splunkd.log during the timerange in question and add a field to indicate so. Then use the information in this field to make a decision on whether or not to re-run the reports.

0 Karma
Highlighted

Re: Report to show missed schedule searches duo downtime

Communicator

Hi,

i found a nearly best solution including being independent from the schedule with kmeans. So first listing all successfull runs, then calculating the delta of each run and then with kmeans adding some statistical calculation to find the outlier. as if the system was off or one scheduled task was missing this can be shown:

`set_internal_index` host="mmaier-mbp15.local" source=*scheduler.log savedsearch_name="BlueCoat - Stats - Collect" status!=continued | stats max(run_time) as Max, count by _time savedsearch_name | sort -_time | delta _time as delta | where Max>0 | eval delta = round (delta/60*-1,2) | kmeans delta | sort -_time | replace 1 with OK, 2 with "ERROR" in CLUSTERNUM

alt text

Even in the line visualization i can visualize it very good:

alt text

View solution in original post

Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.