Note: the only reason you get an "invalid" document is because you are using the /export endpoint. This is not an endpoint you should generally be using unless you have a very specific set of requirements (namely, you need to export a large amount of data out of Splunk).
Your observation that the response as a whole is not a valid JSON document is correct. However, this is by design, and equivalent to what we do with XML output. Let me try and explain our rationale.
The idea behind the /export endpoint is that it is a streaming endpoint, which means it will send out results as soon as they are available. However, this is complicated for searches that never "end", like real-time searches, for example. It is also complicated for searches for which we want to give you previews of the results (the above two are really the same problem, but that's a detail).
In order to support the streaming nature of the endpoint, we need to be able to give you the data in a format that is very easy to parse in a continuous, streaming fashion. That is why we decided to give each single row as an individual JSON object, where rows are guaranteed to be separated by a newline, and we will disclose whenever a "preview" is closed. For example, here is a sample search invocation, and the resulting output:
$ curl -k -u admin:changeme https://localhost:8089/services/search/jobs/export --data-urlencode search="search index=_internal | stats count by sourcetype" -d output_mode=json -d earliest="rt-5m" -d latest="rt"
{"preview":true,"offset":0,"result":{"sourcetype":"eventgen-2","count":"58509"}}
{"preview":true,"offset":1,"result":{"sourcetype":"splunk_web_service","count":"119"}}
{"preview":true,"offset":2,"result":{"sourcetype":"splunkd","count":"4153"}}
{"preview":true,"offset":3,"result":{"sourcetype":"splunkd_access","count":"12"}}
{"preview":true,"offset":4,"lastrow":true,"result":{"sourcetype":"splunkd_stderr","count":"2"}}
{"preview":true,"offset":0,"result":{"sourcetype":"eventgen-2","count":"60886"}}
{"preview":true,"offset":1,"result":{"sourcetype":"splunk_web_service","count":"119"}}
{"preview":true,"offset":2,"result":{"sourcetype":"splunkd","count":"4280"}}
{"preview":true,"offset":3,"result":{"sourcetype":"splunkd_access","count":"12"}}
{"preview":true,"offset":4,"lastrow":true,"result":{"sourcetype":"splunkd_stderr","count":"2"}}
{"preview":true,"offset":0,"result":{"sourcetype":"eventgen-2","count":"63342"}}
{"preview":true,"offset":1,"result":{"sourcetype":"splunk_web_service","count":"119"}}
{"preview":true,"offset":2,"result":{"sourcetype":"splunkd","count":"4404"}}
{"preview":true,"offset":3,"result":{"sourcetype":"splunkd_access","count":"12"}}
{"preview":true,"offset":4,"lastrow":true,"result":{"sourcetype":"splunkd_stderr","count":"2"}}
Now, as you can see, we actually have three previews in here, each with 4 rows (the reason these are all previews is that this is a real-time search, which only has "previews"). We know when each preview ended by looking at the the "lastrow" field in each individual line.
So, as an example, let's say I wanted to get each preview as a single array of objects, it would look like this, in pseudo-code:
// Run the search
splunkResultStream = splunkAPI.exportSearch(....);
// Have a place to buffer the entire preview
currentPreview = []
// OK, let's go
while(true) {
event = splunkResultStream.readUntilNewline();
// Always append to the preview
currentPreview.append(event);
// If it is the last row, we can actually do something!
if (event.lastrow) {
doSomethingWithEntirePreview(currentPreview)
// And now we start a new preview
currentPreview = []
}
}
The above is pretty straightforward, but you might say "why can't we just have a single JSON object for each preview?" That's a reasonable question, and our answer is that these objects are of unbounded size, and JSON is not easily parsed in a streaming option, so we want to give you the option of parsing it in an easy way, while also providing flexibility. For example, there are many use cases where you do not need to buffer an entire preview and actually just care about each row individually, and this can be very performant.
By the way, if this format looks familiar to you, that is on purpose. Many streaming-based APIs use it, the most popular being the Twitter Streaming API.
Hopefully this explains why the format is the way it is. If you have any more questions or need some help in dealing with it, please let us know.
... View more