We have search head splunk cluster. After upgrade to 8.0.1 from 7.2.6 we began to get errors like:
"12-18-2019 16:47:00.816 +0300 INFO SavedSplunker - savedsearch_id="nobody;;", search_type="scheduled", user="admin", app="", savedsearch_name="", priority=default, status=skipped, reason="The maximum number of concurrent running jobs for this historical scheduled search on this cluster has been reached", concurrency_category="historical_scheduled", concurrency_context="saved-search_cluster-wide", concurrency_limit=1, scheduled_time="
Changes in limits.conf did't give results. We can't change concurrency_limit from 1 upward.
Current values of the most relevant parameters (to my opinion):
$ /opt/splunk/bin/splunk show config limits | egrep 'max_rt_search_multiplier|max_searches_per_cpu|shc_role_quota_enforcement|shc_syswide_quota_enforcement|base_max_searches'
shc_role_quota_enforcement=false
shc_syswide_quota_enforcement=false
base_max_searches=6
max_rt_search_multiplier=1
max_searches_per_cpu=8
Seems that SPL-73386 from https://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/KnownIssues most relevant for us, but as you can see we set user to admin and this did't help.
May be error is not linked to bug, but changes in some default values?
We found problem - it sourced from our L3 balancer.
Balancer can open first tcp session to search_head_1, but next http packets from another tcp session send to search_head_2.
With version 7.x we didn't have this problem. I don't know why - may be it is connection between balancer and search head web server or some thing else, but error raised with 8.x version.
So, if you getting on the web interface of search head beside balancer error like "The search job terminated unexpectedly.", check your balancer and try to change it from L3 (ip level) to L7 (http level)
We found problem - it sourced from our L3 balancer.
Balancer can open first tcp session to search_head_1, but next http packets from another tcp session send to search_head_2.
With version 7.x we didn't have this problem. I don't know why - may be it is connection between balancer and search head web server or some thing else, but error raised with 8.x version.
So, if you getting on the web interface of search head beside balancer error like "The search job terminated unexpectedly.", check your balancer and try to change it from L3 (ip level) to L7 (http level)
Hi
Could you please explain how did you debug this issue?
Thanks
Jay
It appears to me as if the settings within limits.conf are not being honored. Have you verified the upgrade process did not alter file or or directory permissions? Meaning from splunk user to root, e.g.
Assuming you are running Splunk as the splunk user, I would run a recursive chown and cycle the SHC.
chown -RP splunk:splunk /opt/splunk
With command:
/opt/splunk/bin/splunk show config limits
i trying to check effective values of variables in limits.conf.
But anyway, if i checking permission - everything ok:
$ find /opt/splunk -name limits.conf -exec ls -la {} \;
-rw-r--r-- 1 splunk splunk 35 Dec 18 18:02 /opt/splunk/etc/apps/<application>/default/limits.conf
-r--r--r-- 1 splunk splunk 42 Nov 28 02:31 /opt/splunk/etc/apps/SplunkLightForwarder/default/limits.conf
-r--r--r-- 1 splunk splunk 43109 Nov 28 02:31 /opt/splunk/etc/system/default/limits.conf
-rw-r--r-- 1 splunk splunk 711 Dec 18 13:28 /opt/splunk/etc/system/local/limits.conf
I believe the limit you are hitting is in savedsearches.conf, not limits.conf
max_concurrent = <unsigned integer>
* The maximum number of concurrent instances of this search that the scheduler
is allowed to run.
* Default: 1
See https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/Savedsearchesconf
Hm, interesting parameter.
I will try it.
Do you know something like "max_concurrent" for adhoc searches?
And may be you know or can suggest what happens with search head cluster, that it trying to run more than one search at time?
My guess is that the scheduler is trying to kick off a new search before the previous one has completed. You can increase the interval between searches, or implement skewing.
https://docs.splunk.com/Documentation/Splunk/8.0.0/Report/Skewscheduledreportstarttimes
adhoc search limits can be configured/modified at the role level via the web UI.
The defaults values are pretty low.
e.g.
Settings > Access controls > Roles > admin
Hello, do you know an SPL to help me find a list of my saved + skipped searches in ES plus the reason for the failure please?