We came across the following issue in production: after starting an export job and receiving a few hundred thousands of events back, the export job stopped returning any more events, yet the connection was still open, and we did not get any exceptions (the API call was blocked waiting on more events). Basically, the program was hanging waiting to read more events from the ResultsReader (we're using the Java API).
Unfortunately this case is quite difficult to reproduce. However, we still want to avoid something like this happening in the future. Is there a way to forcefully timeout the connection in case no more events were being received via the ResultsReader? There is a "setAutoCancel" call for the JobExportArgs API (http://docs.splunk.com/DocumentationStatic/JavaSDK/1.4.0/index.html?com/splunk/JobExportArgs.html), though it is somewhat confusing because the docs say that an export search does not create a new job on the Splunk server:
Export: An export search runs immediately, does not create a job for the search, and starts streaming results immediately. This search is useful for exporting large amounts of data from Splunk Enterprise.
Right now, one course of action is to implement a timeout mechanism on our side, and forcefully restart the search query. Is there a better way?
@conklirb This thread is almost four years old. If the solution offered does not help you, then you should post a new question describing your problem.
The auto_cancel request parameter should solve your use case. By default it is 0 , but you could change it to a value to specify that the export is cancelled after (n) seconds of inactivity.
This parameter is available for export searches as per the REST API docs.
I tested with a small value for auto_cancel (a couple of seconds), but we see a similar behavior. The search returns some results, but then we block forever on the socket read call (reader.getNextEvent()), never timing out.
Thanks Damien, we're going to use this parameter from now on.
Any idea though why this may happen in production (just hanging without fetching any results, after already having fetched some subset of records)?
Also, once this timeout is reached, I presume we're going to get some sort of timeout exception in the code, correct?