Why are my jobs' statuses stuck in "QUEUED" foreve...

Z_Jacob · ‎03-03-2022

I've got a role with more than 6 concurrency limit, and here is what I did:

Step1. I submitted 6 concurrent jobs using API

POST https://<host>:<mPort>/services/search/jobs

Step2. I waited for all the 6 jobs' statuses to be "DONE" using API

GET https://<host>:<mPort>/services/search/jobs/{search_id}

Step3. I set all the 6 jobs' ttl to 3600 to leave me enough time to get all the results using API

POST https://<host>:<mPort>/services/search/jobs/{search_id}/control

Step4. I got all the 6 jobs' results using API

GET https://<host>:<mPort>/services/search/jobs/{search_id}/results

Step5. I deleted all the 6 jobs using API

DELETE https://<host>:<mPort>/services/search/jobs/{search_id}

Step6. I submitted another 6 concurrent jobs and found that most of the jobs' statuses got stuck in "QUEUED" forever ...

I don't know why the first 6 concurrent jobs worked well but the second 6 jobs got "QUEUED" ?
Does the "DELETE" API not work actually?

Roy_9 · ‎06-28-2022

Hello,

This error might occur if your user has gone over disk space quota for saved searches. If that's the case, the error can be seen in the Job Inspector. Delete saved searches under Activity->Jobs to clear this problem.

Try increasing the disk space and see if it helps as an alternative.

Thanks.

lucacaldiero · ‎06-28-2022

Hello,

Please, have you solved your issue or got some clues?

Thanks.

jamie00171 · ‎06-28-2022

Hi @lucacaldiero ,

In the docs (https://docs.splunk.com/Documentation/Splunk/9.0.0/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7...) the example response to DELETE request is:

<response><messages><msg type='INFO'>Search job cancelled.</msg></messages></response

So, I wouldn't think the search artifacts are removed from disk by this REST call and therefore will still count towards the users disk quota. If you then also set the ttl for the searches to an hour you may just be hitting the disk quota for the user and therefore aren't able to run anymore searches until the search artifacts are cleared.

You could confirm the above by checking for the artifacts of the executed searches in the $SPLUNK_HOME$/var/run/splunk/dispatch/ are removed after the DELETE request and check the _audit index to confirm if the disk quota for the user executing the searches has been reached.

Thanks,

Jamie

lucacaldiero · ‎06-28-2022

@jamie00171 @Roy_9 , thanks for answering,
Probably you are right, there's lack of disk space, but the actual issue is that the user gets a "Queued" message, for example, for every dynamic drop down input field in a dashboard and that makes him "crazy" and angry, because it is not true that searches are "Queued". Searches are simply impossible to be executed, even if the user reloads the page.

Do you know a way to monitor per user disk utilization? Is it possible to build some kind of dashboard that represents the percentage of disk quota being used by a user in the last 5 minutes?

Thanks a lot and best regards.

Luca

Why are my jobs' statuses stuck in "QUEUED" forever?

search job inspector

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Why are my jobs' statuses stuck in "QUEUED" forever?

search job inspector

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I