I am looking for a few parameters to make my RT search work better. Current, I am limited using Java search with the following.
final String mySearch = "search index=mydata"; rtJobArgs = new JobArgs(); rtJobArgs.setExecutionMode(JobArgs.ExecutionMode.NORMAL); rtJobArgs.setSearchMode(JobArgs.SearchMode.REALTIME); rtJobArgs.setEarliestTime("rt"); rtJobArgs.setLatestTime("rt"); rtJobArgs.setStatusBuckets(0); final Job job = service.search(mySearch, rtJobArgs); while (!job.isReady()) { SplunkDataUtils.sleep(500); } final JobResultsPreviewArgs previewArgs = new JobResultsPreviewArgs(); previewArgs.setCount(500); // Retrieve 300 previews at a time previewArgs.put("field_list", "_raw,host,source"); while(true) { //dummy loop for this example final InputStream stream = job.getResultsPreview(previewArgs); parseResults(stream); //loops through the stream stream.close(); sleep(250); } job.cancel();
This search continues to return the same records each time through the loop (plus any new data injected). What I need is the window to slide with time real-time and not continually return the records from the same start time. I assume setEarliestTime("rt") is evaluated once a the time rtJobArgs is created. How can I reset the earliest time each loop iteration.
Another issue of the setCount(500) would be a non-issue if the rt window slides, but fills up quickly with old data.
Another question is setStatusBuckets: do I need to use these for my search?
Thanks.
I'm looking into a similar solution.
What I have learned so far based on trial and error and some inputs from the SDK documentation is below.
If you find you are not seeing events in your results or seeing the same events every time, you're likely exceeding the setCount(500) limit.
If you set that to '0' you'll get all events and will have to manage the size yourself.
Sliding window is achieved by setting the earliest time like so:
rtJobArgs.setEarliestTime("rt-15sec");
I believe your sliding window can be anything supported by search time modifiers.
The 15 second real time window will return 15 seconds worth of information from the time you start it and "reset" the window every 15 seconds to return the next 15 seconds worth of real time data each time you call read previews and so on.
Kind of looks like this:
|-----15-----15-----15....
Each segment will be returned in its entirety each time you call read on the previews until the job slides into the next segment.
Your app will have to manage what it needs to read each time the previews are returned. For example:
start real-time search
while(true)
read results
process results where _time is > last event time
store _time of last event read
sleep x number of seconds
end while
I haven't tried with large volumes of data but this has worked for me so far. I have yet to see how/if I miss events in this way.
Good luck!
Good effort - I'll try this.
Have you noticed missing/duplicates with this approach?