Solved: Report to show missed schedule searches duo downti...

Matthias_BY · ‎09-18-2013

Hello,

i have some scheduled searches. Some run every 5 minutes, some 15 minutes some hourly etc.

Some of those searches are there to generate a summary index, a few other to exportcsv to feed it into another tool regularly.

if there is an outage of the search head (even with two search heads or SH pooling) some jobs might be skipped or missed as they won't rerun. This will result in a not complete dataset.

In the first step, i'll would need a Report which shows me, based on the actual schedule which search was skipped.

index=_internal source=*scheduler.log | eval sched = strftime(scheduled_time, "%Y-%m-%d %H:%M:%S") | search savedsearch_name="Project Honey Pot - Threatscore AVG Last 4 Hours" NOT continued | table sched status savedsearch_name

Report Like:

and this for each scheduled search dynamically with the scheduled every 5 minutes.

The above example would show a potential Restart of the SH 15 minutes ago. And then i can manually investigate and re-run the export for this specific timeframe to add the data again... or it can review the last 10 successfull runs - subtract the times and then automatically detect that it is running all 5 minutes.

Thanks a lot
Matthias

Matthias_BY · ‎09-23-2013

Hi,

i found a nearly best solution including being independent from the schedule with kmeans. So first listing all successfull runs, then calculating the delta of each run and then with kmeans adding some statistical calculation to find the outlier. as if the system was off or one scheduled task was missing this can be shown:

`set_internal_index` host="mmaier-mbp15.local" source=*scheduler.log savedsearch_name="BlueCoat - Stats - Collect" status!=continued | stats max(run_time) as Max, count by _time savedsearch_name | sort -_time | delta _time as delta | where Max>0 | eval delta = round (delta/60*-1,2) | kmeans delta | sort -_time | replace 1 with OK, 2 with "ERROR" in CLUSTERNUM

Even in the line visualization i can visualize it very good:

View solution in original post

Matthias_BY · ‎09-23-2013

Hi,

i found a nearly best solution including being independent from the schedule with kmeans. So first listing all successfull runs, then calculating the delta of each run and then with kmeans adding some statistical calculation to find the outlier. as if the system was off or one scheduled task was missing this can be shown:

`set_internal_index` host="mmaier-mbp15.local" source=*scheduler.log savedsearch_name="BlueCoat - Stats - Collect" status!=continued | stats max(run_time) as Max, count by _time savedsearch_name | sort -_time | delta _time as delta | where Max>0 | eval delta = round (delta/60*-1,2) | kmeans delta | sort -_time | replace 1 with OK, 2 with "ERROR" in CLUSTERNUM

Even in the line visualization i can visualize it very good:

_d_ · ‎09-19-2013

Take a look at the SoS app under Search | Scheduler Activity.

_d_ · ‎09-23-2013

Correct, and that's because the scheduler does not keep state across restarts and naturally does not log anything. I suppose that you can modify your search to include a condition that checks whether a shutdown event occurred in splunkd.log during the timerange in question and add a field to indicate so. Then use the information in this field to make a decision on whether or not to re-run the reports.

Matthias_BY · ‎09-23-2013

thanks for your hint - but this shows only successfull run vs. errors. if i have a scheduled activity every 5 minutes and i shutdown my instance of splunk for 1 hour and start it again - i won't get it displyed that it misses several scheduled activities...

Report to show missed schedule searches duo downtime

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!