Splunk Search

How can my custom search command tell when a search is done?

hulahoop
Splunk Employee
Splunk Employee

I have a custom search command which uses the streaming API to retrieve query results. Here's a snippet:

  results = csv.DictReader(sys.stdin)
  for r in results:
    resultsFile.write(str(r['_raw']+'\n'))

Pretty basic.

The problem is I want to operate on the full set of results when the streaming has completed (perform a POST on everything). But how can I/the script tell Splunk is done streaming events?

Tags (1)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Nothing in the protocol to the search commands tells it when it is done.

a misconception about streaming commands is that if you define it as streaming, Splunk will invoke it once, and then stream input events to that invocation, and asynchronously receive a stream of output from it. This is not the case. In fact in current implementations, the script will be called multiple times, each with some chunk of inputs events, and Splunk will expect each invocation to produce correct output for that chunk of input.

specifying that a search command is streaming lets Splunk know that it's okay to do this, i.e., that the script will produce correct results if input it given to it incrementally in any size set, that it will produce results corresponding to the increment, and that if input terminates at any time, the results produced up to that point are complete and accurate, and that therefore it's okay to just call it multiple times with incremental chunks of input. This boils down to saying that your command can work on a single event in isolation, without context of prior or subsequent events.

Given this, Splunk does not expect to need to let your script know when it is done with sending data, since for the purposes of collecting the output of your script, it should not matter.

Unfortunately, non-streaming commands are limited to a single invocation of the script with a limit of 50,000 events. There are certainly cases where you'd want to be able to have non-streaming commands that can handle more than 50k events, and certainly there are cases where you'd want your script be able to receive the entire input as a single stream and produce results asynchronously (as some internal Splunk commands can do) but I believe the current custom search command interfaces don't allow this.


Update

Thinking about it a bit more, a solution might be for you to make your command into a streaming "preop" command. Then, create a non-streaming command that requires and uses your streaming preop command, and have your final POST take place in the non-streaming command after the end. Then you would only call the non-streaming command, which would in turn call your preop.

Please be aware of the effect that non-streaming commands have on map-reduce/distributed queries. The non-streaming command and anything in the search pipeline after it is not map-reduced, but run only on the search head. (The streaming preop would be distributed, as long as everything before it is streaming.)

hulahoop
Splunk Employee
Splunk Employee

Thank you, Gerald. I thought to try the same thing morning and will test and post results here. We are using the streaming command precisely because Intersplunk limits the amount of data returned.

0 Karma

Ayn
Legend

Have a look at commands.conf, specifically I imagine the streaming configuration parameter should be of interest to you.

streaming = [true|false]
* Specify whether the command is streamable.
* Defaults to false.

Is there are reason for not using the Intersplunk methods for getting results? I can honestly say I don't know the specifics of the different methods too well but I imagine your search command would be better at playing nice if using the Intersplunk methods.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Intersplunk doesn't do anything that you can't do with csv.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...