Splunk Search

How to create 24/7 real-time search using the Java SDK?

ajaskey
Engager

I am looking for a few parameters to make my RT search work better. Current, I am limited using Java search with the following.

final String mySearch = "search index=mydata";
rtJobArgs = new JobArgs();
rtJobArgs.setExecutionMode(JobArgs.ExecutionMode.NORMAL);
rtJobArgs.setSearchMode(JobArgs.SearchMode.REALTIME);
rtJobArgs.setEarliestTime("rt");
rtJobArgs.setLatestTime("rt");
rtJobArgs.setStatusBuckets(0);
final Job job = service.search(mySearch, rtJobArgs);
while (!job.isReady()) {
   SplunkDataUtils.sleep(500);
}
final JobResultsPreviewArgs previewArgs = new JobResultsPreviewArgs();
previewArgs.setCount(500); // Retrieve 300 previews at a time
previewArgs.put("field_list", "_raw,host,source");
while(true) { //dummy loop for this example
      final InputStream stream = job.getResultsPreview(previewArgs);
      parseResults(stream);  //loops through the stream
      stream.close();
      sleep(250);
 }
 job.cancel();

This search continues to return the same records each time through the loop (plus any new data injected). What I need is the window to slide with time real-time and not continually return the records from the same start time. I assume setEarliestTime("rt") is evaluated once a the time rtJobArgs is created. How can I reset the earliest time each loop iteration.

Another issue of the setCount(500) would be a non-issue if the rt window slides, but fills up quickly with old data.
Another question is setStatusBuckets: do I need to use these for my search?

Thanks.

Tags (4)

rbarajas
Explorer

I'm looking into a similar solution.

What I have learned so far based on trial and error and some inputs from the SDK documentation is below.

If you find you are not seeing events in your results or seeing the same events every time, you're likely exceeding the setCount(500) limit.

If you set that to '0' you'll get all events and will have to manage the size yourself.

Sliding window is achieved by setting the earliest time like so:

rtJobArgs.setEarliestTime("rt-15sec");

I believe your sliding window can be anything supported by search time modifiers.
The 15 second real time window will return 15 seconds worth of information from the time you start it and "reset" the window every 15 seconds to return the next 15 seconds worth of real time data each time you call read previews and so on.

Kind of looks like this:

|-----15-----15-----15....

Each segment will be returned in its entirety each time you call read on the previews until the job slides into the next segment.
Your app will have to manage what it needs to read each time the previews are returned. For example:

start real-time search
while(true)
   read results
   process results where _time is > last event time 
   store _time of last event read
   sleep x number of seconds
end while

I haven't tried with large volumes of data but this has worked for me so far. I have yet to see how/if I miss events in this way.

Good luck!

0 Karma

nigelbrown
New Member

Good effort - I'll try this.

Have you noticed missing/duplicates with this approach?

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...