We recently upgraded from 6.5.4 to 6.6.0 as an interim step on our way to 7.3.6. We had about 12 realtime searches that triggered alerts that were working perfectly. right after the upgrade to 6.6.0 we get the error above and see that there is a 98% skip ratio in the Scheduler Activity dashboard. Splunk will not help us as this is an unsupported version but i need to get this foxed before i will be allowed to upgrade to the final, supported version. Here is the full event:
07-26-2020 16:59:00.318 -0400 INFO SavedSplunker - savedsearch_id="nobody;search;Route Flapping", search_type="scheduled", user="mvas**", app="search", savedsearch_name="Route Flapping", priority=default, status=skipped, reason="The maximum number of concurrent real-time scheduled searches on this instance has been reached", concurrency_category="real-time_scheduled", concurrency_context="saved-search_instance-wide", concurrency_limit=1, scheduled_time=1595797140, window_time=0
I understand the concurrency limit might be the culprit but have not been able to find how to fix it.
There are quite number of moving elements here
- is it Search Head cluster?
- how many CPU cores/memory your shc and indexers have?
But doing a guess, the settings are configured in `limits.conf` . Run a btool of `limits.conf` and export it out and check the settings carefully. Values as per Link is a good start
in limits.conf check for
- shc_syswide_quota_enforcement
- check against your CPU cores vs maximum searches allowed to see if there is discrepency
base_max_searches = <integer> * A constant to add to the maximum number of searches, computed as a multiplier of the CPUs. * Default: 6 max_rt_search_multiplier = <decimal number> * A number by which the maximum number of historical searches is multiplied to determine the maximum number of concurrent real-time searches. * NOTE: The maximum number of real-time searches is computed as: max_rt_searches = max_rt_search_multiplier x max_hist_searches * Default: 1 max_searches_per_cpu = <integer> * The maximum number of concurrent historical searches for each CPU. The system-wide limit of historical searches is computed as: max_hist_searches = max_searches_per_cpu x number_of_cpus + base_max_searches * NOTE: The maximum number of real-time searches is computed as: max_rt_searches = max_rt_search_multiplier x max_hist_searches * Default: 1
# If number of cpu's in your machine is 14 then total system wide number of # concurrent searches this machine can handle is 20. # which is base_max_searches + max_searches_per_cpu x num_cpus = 6 + 14 x 1 = 20 base_max_searches = 6 max_searches_per_cpu = 1
search heads are not clustered.
the SH running these alerts has 16 cores/16G RAM
i checked the limits.conf and found:
shc_syswide_quota_enforcement = false
Not sure what that means. Also appreciate the advice but have no idea what the rest of your post means. I am a Cisco engineer who had Splunk dumped in my lap so my knowledge is limited.
thanks
whats your rt search limit? If you have 12 real time searches, then your limit for real time jobs should be equal OR more than that. looking from the internal logs your limit is 1.
Please validate these two attributes in authorize.conf
rtSrchJobsQuota
cumulativeRTSrchJobsQuota
Hope this helps.
default=6/100
can_delete=0
power=20/200
admin=100/400
all of the searches are owned by either the admin or my account which has the admin role