Hi,
One of my customers received a "waiting for queued job to start" message today, and it then took about 5 minutes for the job to run. How can I trouble-shoot this, (since I have a boat-load of people about ready to start using Splunk)...
We ran into the same issue on our environment. The number of concurrent searches which can be executed is controlled by max_searches_per_cpu
, which by default is set 1. Also base_max_searches
is added to above number to define the max searches which can be executed at the same time.
max # searches = (value of max_searches_per_cpu
* # CPUs) + base_max_searches
With SHP and based on how many users are logged on to a server load balanced by VIP and also the Dashboards they are launching you might start getting Job queued. One other major is the scheduled reports/searches/alerts you have in the system, these add to the queuing.
Most of the time queuing will be seen at 15, 30, 45 and 00 mins past the hour (like a wave) as users tend to run scheduled stuff every 5/10/15 mins. Hardest hit is at top of the hour when most of the searches run at the same time.
I would advise you start with max_searches_per_cpu
to 2 in local limits.conf on the servers and go up to 4. If you start seeing the issue at value 4 then plan to add another server with same # of CPU to your SHP.
See:
http://docs.splunk.com/Documentation/Splunk/6.1.3/admin/Limitsconf
Search max_searches_per_cpu
in Splunk Answers for more insight.
Even i faced the same issue today and tried varies thing but it didn't worked out. when i increases the User-level concurrent search jobs Limit and total jobs disk quota in the role through access control option , the dashboard started working fine again
This error also occurs if your user has gone over disk space quota for saved searches. If that's the case, the error can be seen in the Job Inspector. Delete saved searches under Activity->Jobs to clear this problem.
This worked too. Thanks
We ran into the same issue on our environment. The number of concurrent searches which can be executed is controlled by max_searches_per_cpu
, which by default is set 1. Also base_max_searches
is added to above number to define the max searches which can be executed at the same time.
max # searches = (value of max_searches_per_cpu
* # CPUs) + base_max_searches
With SHP and based on how many users are logged on to a server load balanced by VIP and also the Dashboards they are launching you might start getting Job queued. One other major is the scheduled reports/searches/alerts you have in the system, these add to the queuing.
Most of the time queuing will be seen at 15, 30, 45 and 00 mins past the hour (like a wave) as users tend to run scheduled stuff every 5/10/15 mins. Hardest hit is at top of the hour when most of the searches run at the same time.
I would advise you start with max_searches_per_cpu
to 2 in local limits.conf on the servers and go up to 4. If you start seeing the issue at value 4 then plan to add another server with same # of CPU to your SHP.
See:
http://docs.splunk.com/Documentation/Splunk/6.1.3/admin/Limitsconf
Search max_searches_per_cpu
in Splunk Answers for more insight.
Thanks this worked great :).
Good stuff! Thanks.
use SOS app to see which jobs are taking more time. The message suggests all of your cores are already taken and they are waiting for a free core to start the job. See the jobs in the jobs option and System Activity about users.
And in a distributed environment they also depend on Indexer how they handle the search head requests. So you might as well look into your indexer usage. Thanks
64 cores per server, using SHP.
The jobs won't show as skipped, because that's solely for scheduled jobs.
How many CPU on the search head? The maximum number of concurrent historical searches is based upon the number of CPU on the search head.
Anyone? I've looked for this message and skipped jobs, but haven't been able to find anything.