Splunk Search

Why are my jobs' statuses stuck in "QUEUED" forever?

Z_Jacob
Engager

I've got a role with more than 6 concurrency limit, and here is what I did:

Step1.  I submitted 6 concurrent jobs using API

 

POST https://<host>:<mPort>/services/search/jobs

 

Step2. I waited for all the 6 jobs' statuses to be "DONE"  using API

 

GET https://<host>:<mPort>/services/search/jobs/{search_id}

 

Step3. I set all the 6 jobs' ttl to 3600 to leave me enough time to get all the results using API

 

POST https://<host>:<mPort>/services/search/jobs/{search_id}/control

 

Step4. I got all the 6 jobs' results using API

 

GET https://<host>:<mPort>/services/search/jobs/{search_id}/results

 

 Step5. I deleted all the 6 jobs using API

 

DELETE https://<host>:<mPort>/services/search/jobs/{search_id}

 

Step6. I submitted another 6 concurrent jobs and found that most of the jobs' statuses got stuck in "QUEUED" forever ...

I don't know why the first 6 concurrent jobs worked well but the second 6 jobs got "QUEUED" ?
Does the "DELETE" API not work actually?

Labels (2)
Tags (3)

Roy_9
Motivator

Hello,

This error might occur if your user has gone over disk space quota for saved searches. If that's the case, the error can be seen in the Job Inspector. Delete saved searches under Activity->Jobs to clear this problem.

Try increasing the disk space and see if it helps as an alternative.

 

Thanks.

0 Karma

lucacaldiero
Path Finder

Hello,

Please, have you solved your issue or got some clues?

Thanks. 

0 Karma

jamie00171
Communicator

Hi @lucacaldiero ,

In the docs (https://docs.splunk.com/Documentation/Splunk/9.0.0/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7...) the example response to DELETE request is:

 

<response><messages><msg type='INFO'>Search job cancelled.</msg></messages></response

 

So, I wouldn't think the search artifacts are removed from disk by this REST call and therefore will still count towards the users disk quota. If you then also set the ttl for the searches to an hour you may just be hitting the disk quota for the user and therefore aren't able to run anymore searches until the search artifacts are cleared.

 

You could confirm the above by checking for the artifacts of the executed searches in the $SPLUNK_HOME$/var/run/splunk/dispatch/ are removed after the DELETE request and check the _audit index to confirm if the disk quota for the user executing the searches has been reached.

 

Thanks, 

Jamie

 

 

 

Tags (1)
0 Karma

lucacaldiero
Path Finder

@jamie00171  @Roy_9 , thanks for answering,
Probably you are right, there's lack of disk space, but the actual issue is that the user gets a "Queued" message, for example, for every dynamic drop down input field in a dashboard and that makes him "crazy" and angry, because it is not true that searches are "Queued". Searches are simply impossible to be executed, even if the user reloads the page.

Do you know a way to monitor per user disk utilization? Is it possible to build some kind of dashboard that represents the percentage of disk quota being used by a user in the last 5 minutes?


Thanks a lot and best regards.

Luca

0 Karma
Get Updates on the Splunk Community!

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had 3 releases of new security content via the Enterprise Security ...

Archived Metrics Now Available for APAC and EMEA realms

We’re excited to announce the launch of Archived Metrics in Splunk Infrastructure Monitoring for our customers ...