I need to create an alert for failed scheduled saved searches. If any scheduled saved searches fails to run due to scheduler problem or any reason, then it would trigger an alert. Can anyone please help me here?
I have tried and found different scheduling status as shown in the attachment.
Among these status values which one should I use for this purpose I'm not sure. So any guidance is welcome.
You can limit it to run on your search heads by adding a pattern or list for your search heads to the query below i.e.
host IN(host01,host02)
index=_internal sourcetype=scheduler status!=success
| table _time search_type status user app savedsearch_name
Here's some dashboard code I'm tinkering with which may help you. You'll probably need to adjust the host filter to have the pattern for your search head hosts.
<form>
<label>Splunk Scheduler v2</label>
<fieldset submitButton="false">
<input type="time" searchWhenChanged="true" token="time">
<label>Time Range</label>
<default>
<earliest>-60m@m</earliest>
<latest>now</latest>
</default>
</input>
<input type="multiselect" searchWhenChanged="true" token="host">
<label>Host</label>
<choice value="*">All</choice>
<valuePrefix>host="</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> OR </delimiter>
<default>*</default>
<fieldForLabel>host</fieldForLabel>
<fieldForValue>host</fieldForValue>
<search>
<query>host="*" app="*" status="*" index=_internal sourcetype=scheduler
| dedup host
| table host
| sort host</query>
<earliest>-24h@h</earliest>
<latest>now</latest>
</search>
</input>
<input type="dropdown" searchWhenChanged="true" token="app">
<label>App</label>
<choice value="*">All</choice>
<default>*</default>
<prefix>app="</prefix>
<suffix>"</suffix>
<fieldForLabel>app</fieldForLabel>
<fieldForValue>app</fieldForValue>
<search>
<query>index=_internal sourcetype=scheduler
| dedup app
| table app
| sort app</query>
<earliest>-24h@h</earliest>
<latest>now</latest>
</search>
</input>
<input type="text" searchWhenChanged="true" token="savedsearch_id_pattern">
<label>Saved Search Name/Pattern</label>
<default>*</default>
<prefix>savedsearch_id="*</prefix>
<suffix>*"</suffix>
</input>
<input type="dropdown" searchWhenChanged="true" token="exclude_savedsearch_id_pattern">
<label>EXCLUDE Name/Pattern</label>
<default>ACCELERATE</default>
<prefix>NOT "*</prefix>
<suffix>*"</suffix>
<choice value="-NONE-">None</choice>
<choice value="ACCELERATE">Report Acceleration</choice>
<initialValue>-NONE-</initialValue>
</input>
<input type="dropdown" token="status">
<label>Status</label>
<choice value="*">All</choice>
<choice value="-NONE-">None</choice>
<choice value="success">Success</choice>
<choice value="skipped">Skipped</choice>
<choice value="continued">Continued</choice>
<default>*</default>
<prefix>status="</prefix>
<suffix>"</suffix>
</input>
<input type="text" token="search_cluster_captain" searchWhenChanged="true">
<label>Search Cluster Captain (use if you want to exclude captain)</label>
<default>not_set</default>
</input>
</fieldset>
<row>
<panel>
<title>All</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| stats count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x555","0x555"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="useColors">1</option>
</single>
</panel>
<panel>
<title>Success</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") status=success NOT host="$search_cluster_captain$"
| stats count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0xd93f3c","0x65a637"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="useColors">1</option>
</single>
</panel>
<panel>
<title>Skipped (capacity reached for role and/or instance)</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") status=skipped NOT host="$search_cluster_captain$"
| stats count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x65a637","0xf7bc38"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="useColors">1</option>
</single>
</panel>
<panel>
<title>Continued (did not complete before next execution)</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") status=continued NOT host="$search_cluster_captain$"
| stats count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x65a637","0xd93f3c"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="useColors">1</option>
</single>
</panel>
</row>
<row>
<panel>
<title>All</title>
<single>
<search>
<query>|makeresults | eval percent=100 | table percent</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x555","0x555"]</option>
<option name="rangeValues">[0]</option>
<option name="unit">%</option>
<option name="useColors">1</option>
</single>
</panel>
<panel>
<title>Success</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| bucket _time span=5m
| top status limit=1000 by _time
| search status=success
| timechart partial=false span=5m avg(percent) as percent by status</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="numberPrecision">0.00</option>
<option name="rangeColors">["0xd93f3c","0xf8be34","0x65a637"]</option>
<option name="rangeValues">[95,99]</option>
<option name="refresh.display">progressbar</option>
<option name="unit">%</option>
<option name="unitPosition">after</option>
<option name="useColors">1</option>
</single>
</panel>
<panel>
<title>Skipped (capacity reached for role and/or instance)</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| bucket _time span=5m
| top status limit=1000 by _time
| search status=skipped
| timechart partial=false span=5m avg(percent) as percent by status</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorBy">value</option>
<option name="colorMode">block</option>
<option name="numberPrecision">0.00</option>
<option name="rangeColors">["0x65a637","0xf8be34","0xdc4e41"]</option>
<option name="rangeValues">[1,5]</option>
<option name="refresh.display">progressbar</option>
<option name="unit">%</option>
<option name="unitPosition">after</option>
<option name="useColors">1</option>
</single>
</panel>
<panel>
<title>Continued (did not complete before next execution)</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| bucket _time span=5m
| top status limit=1000 by _time
| search status=continued
| timechart partial=false span=5m avg(percent) as percent by status</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="numberPrecision">0.00</option>
<option name="rangeColors">["0xdc4e41","0xf8be34","0x53a051"]</option>
<option name="rangeValues">[95,99]</option>
<option name="refresh.display">progressbar</option>
<option name="unit">%</option>
<option name="unitPosition">after</option>
<option name="useColors">1</option>
</single>
</panel>
</row>
<row>
<panel>
<title>Last Execution</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| eval time=strftime(_time,"%Y-%m-%d %H:%M:%S")
| head 1
| table time</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x555","0x555"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="showSparkline">0</option>
<option name="showTrendIndicator">0</option>
<option name="unitPosition">after</option>
<option name="useColors">0</option>
</single>
</panel>
<panel>
<title>Last Success</title>
<single>
<search>
<query>index=_internal $host$ status=success sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| eval time=strftime(_time,"%Y-%m-%d %H:%M:%S")
| head 1
| table time</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x555","0x555"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="showSparkline">0</option>
<option name="showTrendIndicator">0</option>
<option name="unitPosition">after</option>
<option name="useColors">0</option>
</single>
</panel>
<panel>
<title>Last Skipped</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") status=skipped NOT host="$search_cluster_captain$"
| eval time=strftime(_time,"%Y-%m-%d %H:%M:%S")
| head 1
| table time</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x555","0x555"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="showSparkline">0</option>
<option name="showTrendIndicator">0</option>
<option name="unitPosition">after</option>
<option name="useColors">0</option>
</single>
</panel>
<panel>
<title>Last Continued</title>
<single>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$" status=continued
| eval time=strftime(_time,"%Y-%m-%d %H:%M:%S")
| head 1
| table time</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="colorMode">block</option>
<option name="rangeColors">["0x555","0x555"]</option>
<option name="rangeValues">[0]</option>
<option name="refresh.display">progressbar</option>
<option name="showSparkline">0</option>
<option name="showTrendIndicator">0</option>
<option name="unitPosition">after</option>
<option name="useColors">0</option>
</single>
</panel>
</row>
<row>
<panel>
<title>All Scheduler Status</title>
<chart>
<search>
<query>$host$ $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| stats count by status</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>1m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisLabelsY.majorUnit">50</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">pie</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.overlayFields">total</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.showPercent">1</option>
<option name="charting.chart.sliceCollapsingThreshold">0</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.fieldColors">{"success":#65a637,"skipped":#f7bc38,"continued":0xd93f3c}</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">1</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
</chart>
</panel>
<panel>
<title>Scheduler Status Timeline</title>
<chart>
<search>
<query>$host$ $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| timechart count by status
| addtotals fieldname=total</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>1m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisLabelsY.majorUnit">100</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">column</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.overlayFields">total</option>
<option name="charting.chart.showDataLabels">minmax</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.fieldColors">{"success":#65a637,"skipped":#f7bc38,"continued":0xd93f3c}</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">1</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">top</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
</chart>
</panel>
<panel>
<title>Scheduler Volume Timeline by Splunk Instance</title>
<chart>
<search>
<query>$host$ $app$ $status$ (NOT status="*delegated*") $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| timechart count by gww_host limit=50
| addtotals fieldname=total</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>1m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisLabelsY.majorUnit">100</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">column</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.overlayFields">total</option>
<option name="charting.chart.showDataLabels">minmax</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.fieldColors">{"success":#65a637,"skipped":#f7bc38,"continued":0xd93f3c}</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">1</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">top</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Scheduler Status Less Than 100% by App</title>
<chart>
<search>
<query>$host$ $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| top status by app limit=100
| search percent!=100</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
<refresh>5m</refresh>
<refreshType>delay</refreshType>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisLabelsY.majorUnit">50</option>
<option name="charting.axisTitleX.visibility">collapsed</option>
<option name="charting.axisTitleY.visibility">collapsed</option>
<option name="charting.axisTitleY2.visibility">collapsed</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">pie</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.overlayFields">total</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.showPercent">1</option>
<option name="charting.chart.sliceCollapsingThreshold">0</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.fieldColors">{"success":#65a637,"skipped":#f7bc38,"continued":0xd93f3c}</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">1</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">none</option>
<option name="height">225</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">1</option>
<option name="trellis.size">small</option>
<option name="trellis.splitBy">app</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Scheduler Success Ratio by Host (note: scheduler is disabled on the captain)</title>
<chart>
<search>
<query>$host$ $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| timechart partial=t limit=100 count c(eval(status="success")) as success c(eval(status="skipped")) as skipped c(eval(status="continued")) as continued by host</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">collapsed</option>
<option name="charting.axisTitleY.visibility">collapsed</option>
<option name="charting.axisTitleY2.visibility">collapsed</option>
<option name="charting.axisX.abbreviation">none</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.abbreviation">none</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.abbreviation">none</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">connect</option>
<option name="charting.chart.overlayFields">count</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">stacked100</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.mode">standard</option>
<option name="charting.legend.placement">none</option>
<option name="charting.lineWidth">2</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">1</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">small</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Scheduler Volume by Splunk App Space</title>
<table>
<search>
<query>$host$ $app$ $status$ (NOT status="*delegated*") $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| stats count by app savedsearch_name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
<format type="color" field="status">
<colorPalette type="map">{"success":#65A637,"skipped":#F7BC38,"continued":#DC4E41}</colorPalette>
</format>
<format type="color" field="host">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="app">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="user">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="log_level">
<colorPalette type="map">{"INFO":#6DB7C6,"WARN":#F7BC38,"ERROR":#D93F3C}</colorPalette>
</format>
<format type="color" field="reason">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="savedsearch_name">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="current_concurrency">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="concurrency_limit">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="window_time">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="savedsearch_id">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="result_count">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
</table>
</panel>
</row>
<row>
<panel>
<title>Scheduler Volume by Splunk App Space & Search</title>
<table>
<search>
<query>$host$ $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") index=_internal sourcetype=scheduler NOT host="$search_cluster_captain$"
| bucket _time span=1h
| stats count by _time host app user savedsearch_name status
| dedup host app user savedsearch_name status
| eval avg_queries_per_min=round(count/60,2)
| sort -count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">true</option>
<option name="totalsRow">true</option>
<format type="color" field="app">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="savedsearch_name">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="user">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="host">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="status">
<colorPalette type="map">{"success":#53A051,"skipped":#F8BE34,"continued":#DC4E41}</colorPalette>
</format>
</table>
</panel>
</row>
<row>
<panel>
<title>Scheduler Volume Timeline by User</title>
<chart>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| timechart count by user
| addtotals fieldname=total</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">column</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.overlayFields">total</option>
<option name="charting.chart.showDataLabels">minmax</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Avg Latency by app</title>
<chart>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| eval schecduler_latency_secs=dispatch_time-scheduled_time
| timechart avg(schecduler_latency_secs) as schecduler_latency_secs by app limit=50</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">log</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">line</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">minmax</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Scheduler Volume Timeline by app</title>
<chart>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| timechart count by app useother=f
| addtotals fieldname=total</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">column</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.overlayFields">total</option>
<option name="charting.chart.showDataLabels">minmax</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">stacked</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Top warnings by app user Saved Search Name</title>
<table>
<search>
<query>index=_internal $host$ sourcetype=scheduler $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| top reason by host app user savedsearch_name limit=1000
| sort -count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">20</option>
<option name="refresh.display">progressbar</option>
<format type="color" field="savedsearch_id">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="reason">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="app">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="user">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="host">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="savedsearch_name">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
</table>
</panel>
</row>
<row>
<panel>
<title>100 Most Recent Scheduled Searches</title>
<table>
<search>
<query>index=_internal source=*scheduler.log $host$ $app$ $savedsearch_id_pattern$ $exclude_savedsearch_id_pattern$ $status$ (NOT status="*delegated*") NOT host="$search_cluster_captain$"
| head 100
| eval sched = strftime(scheduled_time, "%Y-%m-%d %H:%M:%S")
| table sched host app user status log_level savedsearch_name run_time window_time reason result_count _raw</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
<format type="color" field="status">
<colorPalette type="map">{"success":#65A637,"skipped":#F7BC38,"continued":#DC4E41}</colorPalette>
</format>
<format type="color" field="host">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="app">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="user">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="log_level">
<colorPalette type="map">{"INFO":#6DB7C6,"WARN":#F7BC38,"ERROR":#D93F3C}</colorPalette>
</format>
<format type="color" field="reason">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="savedsearch_name">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="current_concurrency">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="concurrency_limit">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="window_time">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="savedsearch_id">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
<format type="color" field="result_count">
<colorPalette type="sharedList"></colorPalette>
<scale type="sharedCategory"></scale>
</format>
</table>
</panel>
</row>
</form>
You can limit it to run on your search heads by adding a pattern or list for your search heads to the query below i.e.
host IN(host01,host02)
index=_internal sourcetype=scheduler status!=success
| table _time search_type status user app savedsearch_name
@rob_jordan thank you for your reply.
if I use status!=success, then it would consider all those above mentioned status values. But in case of successful scheduled run, the job goes thru some of those status values before becoming success. So even if the job has been scheduled successfully and run properly, it will appear in that query.
hmm on my search heads/search clusters it seems that success+skipped+continued = total scheduled searches
host=searchheadpattern index=_internal sourcetype=scheduler
| top 100 status
I only see success, skipped, continued. I'm thinking that's all you need to tell if there is an issue at a high level. There could be counterpart errors on the indexers, however on the search head it will likely be reported as skipped or continued.
I think your deployment is not distributed. That's why you are not able to view the delegated related status values.
anything with delegated*
is related to how the captain delegates jobs / searches to search head cluster members. you can check and see that the total of status=delegated*
is like the total of skipped
and success
try this search (it is also pre-built in your DMC):
index=_internal sourcetype=scheduler status=skipped
| stats count as skipped_count by search_type user app savedsearch_name reason
you can do the same thing by time to see trends ...
index=_internal sourcetype=scheduler status=skipped
| timechart span=5m count as skipped_count by reason
index=_internal sourcetype=scheduler status IN(skipped,continued)
| table _time search_type status user app savedsearch_name
I think these two should cover you for most scenarios.
Skipped is usually not run due to capacity of user/role or something like being out of disk space.
Continued is also bad as it means the previous run didn't finish before the next run is attempted.
I believe Splunk will only attempt to run one copy of each search unless you override which is usually not a good thing.
@rob_jordan thank you.... I'm using this query... but I have one more ask regarding the continued status ... is it something earlier job run has failed, now it's running the current scheduled? is the current scheduled successful? That is I want to know if the status is continued then is it running the job?
this is much improved query. But still not clear about the delegated_remote_error. Is it a symptom of error?
sourcetype=scheduler status="skipped"
Thank you for your reply.
so status="skipped" means job scheduling has been failed. Then what about delegated_remote_error?