Getting Data In

REST API - JSON Invalid format

Hi,
I was trying to get the data from Splunk using curl REST API with the following detail:-

curl -k -u myusername:mypassword -d search="search index%3Dmain sourcetype%3Dmain-st | dedup Name | table Name, _geo" -d earliest_time=-24h@h -d output_mode="json" https://myurl:8089/servicesNS/admin/maps/search/jobs/export

The result was,

{"preview":true,"offset":0,"result":{"Name":"mamycita","_geo":"7.13,120.193"}}
{"preview":true,"offset":0,"result":{"Name":"gogocita","_geo":"7.13,120.193"}}

Upon checking in http://jsonlint.com/, the result from Splunk was not a valid JSON format.
,

Is there anything we need to do to get the valid json data via REST API? Or Did I make any mistake in the code?

Thanks

Tags (2)

New Member

we would need to remove the header and footer of the json file that you have received via rest api. So that, it will be structured as json. use SEDCMD regex in props.conf. Even am looking for help how can i generate the regex.

0 Karma

New Member

I found this to work a little more consistently -

def fixSplunkJSON( strSplunk ) :

strJSON = strSplunk.replace( '}}', '}},' )
strJSON = strJSON.replace( '}}\n', '}}' )
strJSON = strJSON.rstrip( '\n' )
strJSON = strJSON.rstrip( ',' )
strJSON = '[' + strJSON + ']'
return strJSON
0 Karma

Contributor

Struggling with this, can you shed some more light on this.
I am outputting an API result and it looks like this on the command line

{"preview":false,"offset":13,"result":{"Time":"17:55 Aug 13 2019","client-note":"Test event"}}
{"preview":false,"offset":14,"result":{"Time":"22:55 Aug 12 2019","client-note":"testing Android"}}
{"preview":false,"offset":15,"result":{"Time":"21:05 Aug 12 2019","client-note":"testing Hello"}}

This API is going to be executed externally from a mobile device and the result needs to be valid json
ie comma separated not newline

Can you explain how I could apply your def to this action so the output is formatted correctly

Cheers

0 Karma

New Member

This is also an issue, having a non-completed JSON output is not ideal for an API call. I ended up having to write up some code to wrap around this issue, so we could consume it. I have posted the python3 code here - so you can consume it if you have a similar issue, i had created a script to wrap around the lack of ability that SPLUNK offers in this simple requirement.

import json
from collections import defaultdict

class GetDetailsForEmail():
def init(self):
self.load = self.setup_details('file.json')

def setup_details(self, filename):
    data = open(filename).read()
    count = data.count("}}") - 1
    data = data.replace("}}", "}},", count)
    data = data.replace("}}\n", "}}]\n")
    data = data.replace('{', '[{', 1)
    newdata = json.loads(data)
0 Karma

New Member

I also agree with this, it is broken if its not a proper JSON payload. It needs a simple array around it and make it , delimeted per json row.

I ended up fixing this in my python - i have attached the fix below :

!/bin/python3.6

import json
from collections import defaultdict

class GetDetails():
def init(self):
self.load = self.setup_details('file.json')

def setup_details(self, filename):
    data = open(filename).read()
    count = data.count("}}") - 1
    data = data.replace("}}", "}},", count)
    data = data.replace("}}\n", "}}]\n")
    data = data.replace('{', '[{', 1)
    newdata = json.loads(data)

Then you can use in another def to run through the actuals, not ideal SPLUNK to have a bad json format.

0 Karma

Engager

This is disappointing. At the very least, you should have the option of specifying to stream results as they are available or to return a single, valid, json document. If the endpoint doesn't return a valid document, then the endpoint is broken, plain and simple.

Additionally, the engineering team at some point seems to have reused this same functionality to provide json exporting via the web interface. This means that after performing a search and then attempting to download, the downloaded file (via HTTP, web client, no streaming) is broken. Sure, I can reparse it and fix it, but then I'm only being reminded that the tool is broken in the first place.

Splunk Employee
Splunk Employee

Note: the only reason you get an "invalid" document is because you are using the /export endpoint. This is not an endpoint you should generally be using unless you have a very specific set of requirements (namely, you need to export a large amount of data out of Splunk).

Your observation that the response as a whole is not a valid JSON document is correct. However, this is by design, and equivalent to what we do with XML output. Let me try and explain our rationale.

The idea behind the /export endpoint is that it is a streaming endpoint, which means it will send out results as soon as they are available. However, this is complicated for searches that never "end", like real-time searches, for example. It is also complicated for searches for which we want to give you previews of the results (the above two are really the same problem, but that's a detail).

In order to support the streaming nature of the endpoint, we need to be able to give you the data in a format that is very easy to parse in a continuous, streaming fashion. That is why we decided to give each single row as an individual JSON object, where rows are guaranteed to be separated by a newline, and we will disclose whenever a "preview" is closed. For example, here is a sample search invocation, and the resulting output:

$ curl -k -u admin:changeme https://localhost:8089/services/search/jobs/export --data-urlencode search="search index=_internal | stats count by sourcetype" -d output_mode=json -d earliest="rt-5m" -d latest="rt"

{"preview":true,"offset":0,"result":{"sourcetype":"eventgen-2","count":"58509"}}
{"preview":true,"offset":1,"result":{"sourcetype":"splunk_web_service","count":"119"}}
{"preview":true,"offset":2,"result":{"sourcetype":"splunkd","count":"4153"}}
{"preview":true,"offset":3,"result":{"sourcetype":"splunkd_access","count":"12"}}
{"preview":true,"offset":4,"lastrow":true,"result":{"sourcetype":"splunkd_stderr","count":"2"}}
{"preview":true,"offset":0,"result":{"sourcetype":"eventgen-2","count":"60886"}}
{"preview":true,"offset":1,"result":{"sourcetype":"splunk_web_service","count":"119"}}
{"preview":true,"offset":2,"result":{"sourcetype":"splunkd","count":"4280"}}
{"preview":true,"offset":3,"result":{"sourcetype":"splunkd_access","count":"12"}}
{"preview":true,"offset":4,"lastrow":true,"result":{"sourcetype":"splunkd_stderr","count":"2"}}
{"preview":true,"offset":0,"result":{"sourcetype":"eventgen-2","count":"63342"}}
{"preview":true,"offset":1,"result":{"sourcetype":"splunk_web_service","count":"119"}}
{"preview":true,"offset":2,"result":{"sourcetype":"splunkd","count":"4404"}}
{"preview":true,"offset":3,"result":{"sourcetype":"splunkd_access","count":"12"}}
{"preview":true,"offset":4,"lastrow":true,"result":{"sourcetype":"splunkd_stderr","count":"2"}}

Now, as you can see, we actually have three previews in here, each with 4 rows (the reason these are all previews is that this is a real-time search, which only has "previews"). We know when each preview ended by looking at the the "lastrow" field in each individual line.

So, as an example, let's say I wanted to get each preview as a single array of objects, it would look like this, in pseudo-code:

// Run the search
splunkResultStream = splunkAPI.exportSearch(....);

// Have a place to buffer the entire preview
currentPreview = []

// OK, let's go
while(true) {
    event = splunkResultStream.readUntilNewline();

    // Always append to the preview        
    currentPreview.append(event);

    // If it is the last row, we can actually do something!
    if (event.lastrow) {
        doSomethingWithEntirePreview(currentPreview)

        // And now we start a new preview
        currentPreview = []
    }
}

The above is pretty straightforward, but you might say "why can't we just have a single JSON object for each preview?" That's a reasonable question, and our answer is that these objects are of unbounded size, and JSON is not easily parsed in a streaming option, so we want to give you the option of parsing it in an easy way, while also providing flexibility. For example, there are many use cases where you do not need to buffer an entire preview and actually just care about each row individually, and this can be very performant.

By the way, if this format looks familiar to you, that is on purpose. Many streaming-based APIs use it, the most popular being the Twitter Streaming API.

Hopefully this explains why the format is the way it is. If you have any more questions or need some help in dealing with it, please let us know.

Path Finder

My 2 cents: if you intentionally output invalid JSON, why not just not output in JSON-like at all? So that people would have less confusion and can workaround accordingly.

Splunk Employee
Splunk Employee

If you have 10-15 rows and none of them are previews, then you can simply split on newlines (\n), and then parse each string as a JSON object and meld them together. If some of them are previews, basically do the same, just ignore anything that has the preview property set to true.

Note that since you have so few rows, you can just buffer the entire response into a single row.

Explorer

So whats the best way to parse the result set? right now I only have about 10-15 rows that return. IS there different methods depending on how large the result set it?

thanks

Splunk Employee
Splunk Employee

You also have the choice to use json_rows or json_cols as output modes. The output for these is formatted in rows and columns respectively and it will be one big block of valid json string.

You can also look at some example usage of these output modes in the Splunk JavaScript SDK - Examples section.

SplunkTrust
SplunkTrust

Each line of the preview is valid JSON, you just need to split it up.

It's not a valid JSON document if "each line is valid JSON". Fundamentally, no json parser can parse this response - which is the whole point of returning JSON, so it's easy to parse. Having to pre-parse a JSON response defeats the whole purpose.

I opened a case with Splunk support and they've indicated that they have reproduced the issue and that it is indeed returning invalid JSON. A fix should hopefully be forthcoming.

Contributor

Good point. I suspect nothing has been changed so far.

0 Karma