Splunk Search

Why are we getting message "waiting for queued job to start..." and search job takes 5 minutes to run?

a212830
Champion

Hi,

One of my customers received a "waiting for queued job to start" message today, and it then took about 5 minutes for the job to run. How can I trouble-shoot this, (since I have a boat-load of people about ready to start using Splunk)...

1 Solution

bohrasaurabh
Communicator

We ran into the same issue on our environment. The number of concurrent searches which can be executed is controlled by max_searches_per_cpu, which by default is set 1. Also base_max_searches is added to above number to define the max searches which can be executed at the same time.

max # searches = (value of max_searches_per_cpu * # CPUs) + base_max_searches

With SHP and based on how many users are logged on to a server load balanced by VIP and also the Dashboards they are launching you might start getting Job queued. One other major is the scheduled reports/searches/alerts you have in the system, these add to the queuing.

Most of the time queuing will be seen at 15, 30, 45 and 00 mins past the hour (like a wave) as users tend to run scheduled stuff every 5/10/15 mins. Hardest hit is at top of the hour when most of the searches run at the same time.

I would advise you start with max_searches_per_cpu to 2 in local limits.conf on the servers and go up to 4. If you start seeing the issue at value 4 then plan to add another server with same # of CPU to your SHP.

See:
http://docs.splunk.com/Documentation/Splunk/6.1.3/admin/Limitsconf

Search max_searches_per_cpu in Splunk Answers for more insight.

View solution in original post

rajiv_r
Explorer

Even i faced the same issue today and tried varies thing but it didn't worked out. when i increases the User-level concurrent search jobs Limit and total jobs disk quota in the role through access control option , the dashboard started working fine again

andygerber
Path Finder

This error also occurs if your user has gone over disk space quota for saved searches. If that's the case, the error can be seen in the Job Inspector. Delete saved searches under Activity->Jobs to clear this problem.

mbksplunk
Explorer

This worked too. Thanks

0 Karma

bohrasaurabh
Communicator

We ran into the same issue on our environment. The number of concurrent searches which can be executed is controlled by max_searches_per_cpu, which by default is set 1. Also base_max_searches is added to above number to define the max searches which can be executed at the same time.

max # searches = (value of max_searches_per_cpu * # CPUs) + base_max_searches

With SHP and based on how many users are logged on to a server load balanced by VIP and also the Dashboards they are launching you might start getting Job queued. One other major is the scheduled reports/searches/alerts you have in the system, these add to the queuing.

Most of the time queuing will be seen at 15, 30, 45 and 00 mins past the hour (like a wave) as users tend to run scheduled stuff every 5/10/15 mins. Hardest hit is at top of the hour when most of the searches run at the same time.

I would advise you start with max_searches_per_cpu to 2 in local limits.conf on the servers and go up to 4. If you start seeing the issue at value 4 then plan to add another server with same # of CPU to your SHP.

See:
http://docs.splunk.com/Documentation/Splunk/6.1.3/admin/Limitsconf

Search max_searches_per_cpu in Splunk Answers for more insight.

scc00
Contributor

Thanks this worked great :).

0 Karma

a212830
Champion

Good stuff! Thanks.

0 Karma

linu1988
Champion

use SOS app to see which jobs are taking more time. The message suggests all of your cores are already taken and they are waiting for a free core to start the job. See the jobs in the jobs option and System Activity about users.

And in a distributed environment they also depend on Indexer how they handle the search head requests. So you might as well look into your indexer usage. Thanks

a212830
Champion

64 cores per server, using SHP.

0 Karma

sowings
Splunk Employee
Splunk Employee

The jobs won't show as skipped, because that's solely for scheduled jobs.

How many CPU on the search head? The maximum number of concurrent historical searches is based upon the number of CPU on the search head.

0 Karma

a212830
Champion

Anyone? I've looked for this message and skipped jobs, but haven't been able to find anything.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...