Response to your question is not very simple. At high level splunk run following type of searches
@Scheduled Searches ( running and delegated )
@Report Acceleration (running and delegate)
@datamodel acceleration (running and delegated)
To calculate number of SHC wide concurrent searches running at any given time it is required to calculate at adhoc searches+ scheduled searches + Report Acceleration scheduled searches + datamodel acceleration scheduled searches +delegrated searches .
Here are various log and searches that can be leveraged to get some stats, but these searches won’t provide you complete data. Splunk currently has an open Enhancement Request (SPL-125101:Comprehensive search concurrency metrics) to streamline these stats for reporting needs.)
1) The introspection log provide snapshot of all searches running on the SHC members. This snapshot is taken every 10sec for scheduled searches + Report Acceleration+ datamodel acceleration. You can use the search below to get trend of the searches being run in each category.
index=_internal ( host=<> ….) sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=* | eval data.search_props.type = if(like('data.search_props.sid',"%_scheduler_%"),"scheduled",'data.search_props.type') | bin _time span=10s | stats dc(data.search_props.sid) AS distinct_search_count by _time,data.search_props.type | timechart bins=200 max(distinct_search_count) AS "median of search concurrency" by data.search_props.type| addtotals
Stats form introspection Data has following challenges :
@@Introspection Data is sampled every 10sec which means the searches that finished during 10s won’t get accounted.
@@ Introspection Data also doesn’t account for delegated searches
Due to these challenged introspection date can only be used to see the trend and may show stats below the actual search load.
2) To get the delegated searches I have been researching it in last few days and development has provided useful tips as published in https://answers.splunk.com/answers/449024/search-head-cluster-scheduled-searches-and-status.html
Based on this the scheduler/captain calculates the total number of scheduled searches can be derived from metrics (group=searchscheduler) as activeScheduledSearches.size + activeDelegatedSearch.size and below is the sample searches - but this metrics is missing adhoc searches.
Another limitation with this search is that it’s sampled(snapshotted ) every 30 sec. So even this data will miss the searches that finished in between those 30 sec
Scheduler Activity (based on metrics.log) :
index=_internal sourcetype=splunkd source=metrics group=searchscheduler | timechart span=3m sum(dispatched) as dispatched, sum(skipped) as skipped, sum(delegated) as delegated Max(delegated_waiting) as delegated_waiting, sum(delegated_scheduled) as delegated_scheduled, Max(max_pending) as max_pending, Max(max_running) as max_running
3)Here is another search that can be used to get scheduled ( running + skipped) from scheduler.log along with adhoc from _audit. To get meaning full data you need to run it for long time period like 4 hours or above. This is also missing delegated search. Another challenge is with audit log as it’s not always complete for ad-hoc searches. So number may be bit skewed.
Skipped searches vs concurrency:
host=<SHC_HOST_NAME> (index=_internal source=*/scheduler.log* (status=success run_time=*) OR status=skipped) OR (index=_internal source=*/scheduler.log* (status=success run_time=*) OR status=skipped) OR ((index=_audit action=search info=completed) (NOT search_id='scheduler_*' NOT search_id='rsa_*')) | eval type=if(status="skipped", "skipped", "completed") | eval run_time=coalesce(run_time, total_run_time) | eval counter=-1 | appendpipe [ | eval counter=1 | eval _time=_time - run_time ] | sort 0 _time | streamstats sum(counter) as concurrency by type | table _time concurrency counter run_time type | timechart partial=f sep=_ span=1m count min(concurrency) as tmin max(concurrency) as tmax by type | rename count_skipped as skipped tmin_completed as min_concurrency tmax_completed as max_concurrency | fields + _time skipped *_concurrency | filldown *_concurrency
Delayed-minutes vs concurrency:
host=<SHC_HOST_NAME> index= _audit (action=search info=completed) (NOT search_id='scheduler_*' NOT search_id='rsa_*') | eval run_time=coalesce(run_time, total_run_time) | eval counter=-1 | appendpipe [ | eval counter=1 | eval _time=_time - run_time ] | sort 0 _time | streamstats sum(counter) as concurrency | timechart partial=f sep=_ span=1m min(concurrency) as min_concurrency max(concurrency) as max_concurrency | filldown *_concurrency | join _time [ | search index=internal host=<SHC_HOST_NAME> source=*/scheduler.log* (status=success OR status=continued OR status=skipped) | eval dispatch_time = coalesce(dispatch_time, _time) | eval scheduled_time = if(scheduled_time > 0, scheduled_time, "WTF") | eval window_time = coalesce(window_time, "0") | eval execution_latency = max(dispatch_time - (scheduled_time + window_time), 0) | timechart partial=f sep=_ span=1m sum(execution_latency) as delayed_seconds | eval delayed_minutes=coalesce(delayed_seconds/60, 0) | fields + _time delayed_minutes
Due to these limitation currently splunk provide some challenges when you are trying to find Comprehensive search concurrency metrics .