Developing for Splunk Enterprise

Disk Quota Limits, Search API Endpoint Differences and Parameters

stranjer
Loves-to-Learn Lots

Disk Quota Limits, Search API Endpoint Differences and Parameters

Looking for better clarity and deeper understanding to better solve a recurring issue I'm seeing.

We have a script performing searches using the API. Currently, the flow works like this:

  • Start search by doing POST /services/search/jobs with search as parameter, get back search id (sid).
  • Run loop to do GET /services/search/jobs/{sid} and check the search job status until done
  • Pull back in results of search by doing GET /services/search/jobs/{sid}/results
  • After results are pulled back, do POST /services/search/jobs/{sid}/control to cancel search and delete the result cache

The script is designed to not try to run more than a few searches at a time, and to wait until earlier searches have been canceled to start the next search.

However, we are still sometimes hitting the search disk quota limitation. We've increased this limit a few times well past the default, which has reduced frequency but issue still comes up. We do NOT want to change this to unlimited, nor keep increasing it every time it gets hit.There are a few questions I'm not able to find documentation on when trying to figure out solutions:

  • Is there normally delay after a search has successfully been cancelled before the results cache is removed?
    • Or somehow until the disk usage quota is updated to reflect the cleared space?
  • Would doing a DELETE /services/search/jobs/{sid} clear up space quicker?
  • Would switching to using the /services/search/jobs/export endpoint help?
    • If the results are streaming, do they also still persist on disk?
    • The Python & Java SDK docs say export searches '...return results in a stream, rather than as a search job that is saved on the server.' But I'm not sure that means the result cache isn't saved.
  • Does setting a low 'timeout' value in the search/jobs parameter clear the disk space after that value has passed?
  • With the 'auto_cancel' parameter what counts as 'inactivity'?
    •  checking the status of the SID?
    • retrieving results?
    • If accidently set to do 'search index=* ' for all time, does this stop it before completion? ( I assume so, but wanted confirmation )

The documentation is unclear on what some phases mean (like 'inactivity', or 'rather than saved on the server' in the SDK docs ), and some other parts are likely simplifications/abstracts of concepts I need to understand more in-depth (cancelling/deleting jobs, clearing disk space).

Trying to avoid just putting band-aids on a bullet wound, but need more details to determine the right treatment.

Labels (2)
0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.