Deployment Architecture

Search Head Cluster: Pre 6.3 we could run more number of Schduled Searches Concurrently.

sat94541
Communicator

Prior to 6.3, quotas were enforced instead on a member-by-member basis for SHC we have the System Wide Quote Working. Here is an example for this
Let’s assume that we have 2 members in the cluster- set to run scheduled searches as the other 10 members as adhoc only. The Captain takes into consideration number of members in the cluster who can run scheduled searches?
((24*3)+6 ).85 2 = 132
In this case following settings were used : 24 core machines :

machines total and they are both considered job_servers and run scheduled searches
1 of them is the captain
max_search_per_cpu = 3
base_max_searches = 6
max_searches_perc = 85
auto_summary_perc = 85

When the system-wide Quota is reached splunkd.log files shows following messages.

12-11-2015 11:20:40.790 -0500 WARN SavedSplunker - The maximum number of concurrent scheduled searches has been reached (limits: historical=132, realtime=132). historical=233, realtime=0 ready-to-run scheduled searches are pending.
host = iapp106.howard.ms.com-9000 source = /var/hostlinks/farm/splunk/prod/federated-shc-1/9000/home/var/log/splunk/scheduler.log sourcetype = scheduler

Since upgrading to Splunk Version 6.3.1 we are seeing less number of Concurrent Scheduled searches.
Did something changed in 6.3?

rbal_splunk
Splunk Employee
Splunk Employee

1) Bug SPL-111603:[Regression] No warn message for those skipped saved searches if hit the system-wide quota in SHC

This Bug is marked as fixed in 6.4. In case you will like to try and reproduce you can take following steps

1.1. Create a SHC with 3 members
1.2. Set search quota.

$splunk_home/etc/system/local/limits.conf

[search]
base_max_searches=1
max_searches_per_cpu=0

1.3. Create 5 scheduled real time search every minutes.

curl -k -u admin:changed https://localhost:8089/servicesNS/admin/search/saved/searches -d name=testRT_1 -d search="index=_internal | head 30" -d is_scheduled=1 -d dispatch.earliest_time="rt-20m" -d dispatch.latest_time="rtnow" -d cron_schedule="* * * * " ;
curl -k -u admin:changed https://localhost:8089/servicesNS/admin/search/saved/searches -d name=testRT_2 -d search="index=_internal | head 30" -d is_scheduled=1 -d dispatch.earliest_time="rt-20m" -d dispatch.latest_time="rtnow" -d cron_schedule="
* * * " ;
curl -k -u admin:changed https://localhost:8089/servicesNS/admin/search/saved/searches -d name=testRT_3 -d search="index=_internal | head 30" -d is_scheduled=1 -d dispatch.earliest_time="rt-20m" -d dispatch.latest_time="rtnow" -d cron_schedule="
* * * " ;
curl -k -u admin:changed https://localhost:8089/servicesNS/admin/search/saved/searches -d name=testRT_4 -d search="index=_internal | head 30" -d is_scheduled=1 -d dispatch.earliest_time="rt-20m" -d dispatch.latest_time="rtnow" -d cron_schedule="
* * * " ;
curl -k -u admin:changed https://localhost:8089/servicesNS/admin/search/saved/searches -d name=testRT_5 -d search="index=_internal | head 30" -d is_scheduled=1 -d dispatch.earliest_time="rt-20m" -d dispatch.latest_time="rtnow" -d cron_schedule="
* * * *" ;

1.4)wait 2 minutes.
1.5)Check from each search heads:

grep "system-wide" ~/splunk/var/log/splunk/splunkd.log

-21-2015 10:09:05.896 +0800 WARN SHPMaster - Search not executed: The maximum number of real-time concurrent system-wide searches has been reached. current=1 maximum=1 for search: admin;search;testRT_1
12-21-2015 10:09:10.952 +0800 WARN SHPMaster - Search not executed: The maximum number of real-time concurrent system-wide searches has been reached. current=1 maximum=1 for search: admin;search;testRT
_2

2> SPL-111347:SHC: Search and scheduler performance degrade with increasing number of saved searcher

This performance degradation was seen only when the saved searches were in thousands.

rbal_splunk
Splunk Employee
Splunk Employee

Starting with 6.3, default behavior for handling user-based and role-based concurrent search quotas has changed. In 6.3, the search head cluster enforces these quotas across the set of cluster members. Prior to 6.3, quotas were enforced instead on a member-by-member basis.

Both Pre and Post 6.3, doesn’t take into account the search user when it assigns a search to a member. Combined with the pre-6.3 behavior of member-enforced quotas, this could result in unwanted and unexpected behavior. For example, if the captain happened to assign most of a particular user's searches to one cluster member, that member could quickly reach the quota for that user, even though other members had not yet reached their limit for the user.

If you need to maintain the pre-6.3 behavior, make these attribute changes in

limits.conf:
shc_role_quota_enforcement = false
shc_local_quota_check = true

The new role-based quota on a cluster is calculated by multiplying the individual user's role quota by the number of cluster members.

In release 6.3 we have also introduced a New Bug >SPL-101954/SPL-110906:srchJobsQuota and rtSrchJobsQuota are not centralized across search head clusters.

Seems like with the changes for new role based quota for SHC in 6.3 this Bug got introduced. Due to this Bug in 6.3 and above the system wide System quota will not be reached in 6.3.1 instead only the per node system quota is enforced while scheduling a job.

Also, Note there are two kinds of quotas messages that look almost similar. (see below)

  1. System Quota

2-08-2015 07:00:24.537 -0600 WARN SHPMaster - Search not executed: The maximum number of historical concurrent system-wide searches has been reached. current=204 maximum=78 for search: …

  1. Role based Quota 12-11-2015 11:20:40.790 -0500 WARN SavedSplunker - The maximum number of concurrent scheduled searches has been reached (limits: historical=132, realtime=132). Historic

In addition here are few other Bug that are somewhat relevant in this situation.

SPL-111603:[Regression] No warn message for those skipped saved searches if hit the system-wide quota in SHC
SPL-111347:SHC: Search and scheduler performance degrade with increasing number of saved searcher

gsumner
Explorer

How can we find more information on the two bugs noted above?
SPL-111603:[Regression] No warn message for those skipped saved searches if hit the system-wide quota in SHC
SPL-111347:SHC: Search and scheduler performance degrade with increasing number of saved searcher

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...