We have a web application that is making REST API calls to Splunk to run searches to retrieve results. We expect a lot of users to run very broad searches that will return thousands if not millions of results. As human beings, the users will probably only look at the first few pages (like how most of us use Google). These big searches not only take time to fully finish, but the completed jobs can take up many GBs in the dispatch directory.
Is there a way for the REST API to pause a search job so that it only retrieves X number of results first? Is there a way to then resume the search if the user wants more? The goal is to only fill the dispatch directory with as much data as the user is able to page through on a web app UI so that we don't get the "out of disk space" error which causes newly submitted searches to get queued.
@emiliavanderwerf, "first off", as a Splunk Administrator for these External App Role Access in Splunk Restrict the Max Disk Quota, Max Concurrent Searches and Time Range selection.
Secondly, by default you can configure the App to pull only specific records using | head 10
or | head 100
in your default Saved Search. Only on requesting details you can run your current Saved Search with more/all rows. This approach would imply duplicating your Saved Search as Summary
and Details
. Give access to Details Saved Searches only to specific users.
@emiliavanderwerf, "first off", as a Splunk Administrator for these External App Role Access in Splunk Restrict the Max Disk Quota, Max Concurrent Searches and Time Range selection.
Secondly, by default you can configure the App to pull only specific records using | head 10
or | head 100
in your default Saved Search. Only on requesting details you can run your current Saved Search with more/all rows. This approach would imply duplicating your Saved Search as Summary
and Details
. Give access to Details Saved Searches only to specific users.
Check out the count
option at http://docs.splunk.com/Documentation/Splunk/7.1.1/RESTREF/RESTprolog#Pagination_and_filtering_parame...
|rest /services/search/jobs/%s/results output_mode=raw count=600,000 offset=6
Combination of count and offset will help you better
Thanks for your help. My understanding of count
is that even though only count
number of entries will be returned, the search runs to completion in the background and (if it is a very broad search) will still take up a large amount of GB in the dispatch directory. Using count
does not pause the search after count
number of results have been retrieved & restart the search if the user requests more results using a larger number for count
.
The main problem I'm trying to avoid is that broad searches take up a large amount of space in the dispatch directory. Do you have more suggestions please?
Tough problem. Consider the auto_cancel
and auto_finalize_ec
parameters when you submit the search/jobs requests.
You can also send a search/jobs/{search_id}/control/cancel
request if the user doesn't request more results or maybe even between result batches.