Riding the coattail of Re: Why is the null value in a JSON event not being parsed properly as NULL?, I constructed two events:
{"field1": "value1", "field2": null, "field3": 1234, "field4": true}{"field1": "value1", "field2": null, "field3": 1234, "field4": false}Because they differ only in field4, one would expect Splunk to differentiate only field4. However, if I give one to makeresults format=json, and one to spath, Splunk treats JSON null differently. Like this:
| makeresults format=json data="[{\"field1\":\"value1\",\"field2\": null,\"field3\":1234, \"field4\":true}]"
| append
[makeresults
| eval _raw = "{\"field1\":\"value1\",\"field2\": null,\"field3\":1234, \"field4\":false}"
| spath]Surprise!
Turning JSON true/false to string literal is understandable because Splunk doesn't pass its internal boolean type to next pipe. I am guessing that Splunk also has some good reason - as @somesoni2 implied in the answer, to not use Splunk's null aka absence of value, to represent JSON null. But with makeresults's format=json, Splunk seems to have a different mind. Is this interesting?
I can speculate some reasons why "omitting" keys with null value can be disadvantageous for a tool like Splunk. I can also enumerate some disadvantages of not differentiating null value with an intentional, even incidental string value of "null". I'd like to gain some deeper understanding of the balancing act.
Background: I had the same question as the OP regarding JSON source ingestion after years of blaming the data source of misrepresenting null. makeresults is a slightly different animal so I post it here. Should mode=json behave the same as spath? The other way around?
INDEXED_EXTRACTIONS and KV_MODE both interpret null values as the string "null."
Would I prefer all Splunk JSON extraction methods treat null values as null? Yes! But it would break more than a decade of prior art.
If I know my source data does not use "null" in string values, i.e., the following is forbidden:
{"foo": "null"}I treat the string "null" as null:
index=main foo=* foo!=nullindex=main
| eval foo=nullif(foo, "null")Having used Splunk before native support for JSON field extractions was added, I'm fine with the workarounds given the convenience.
See https://ideas.splunk.com/ideas/EID-I-2311 for another complication!
Perhaps a new Splunk Idea advocating for a "strict" extraction mode is warranted? E.g.:
# props.conf
[foo]
INDEXED_EXTRACTIONS = JSON
# default JSON_MODE = relaxed
JSON_MODE = strict
[bar]
KV_MODE = json
JSON_MODE = strict| spath input=_raw output=foo path="foo" mode=strict ``` default mode=relaxed ```
| eval foo=json_extract_strict(_raw, "foo")and the opposite for the makeresults command:
| makeresults format=json mode=relaxed data="{\"foo\": null}" ``` default mode=strict ```As a concession to existing code, it could be a new, internal search directive accessible through the noop command:
| noop json_mode=strict
Well... neither solution is really compatible with Splunk since Splunk has no way of expressing the null value. An empty string is just a string which happens to be of zero length. So it's not a null. A null value for Splunk means that the field is not present, which is not the same as json field with a null content.
It's just one more quirk which happens when Splunk tries to handle jsons - since it doesn't work with structured data it does a lot of ducttaping and handwaving to make _something_ happen but as we know there are at least three separate and differently behaving json extraction modes (indexed extractions, kvmode and spath), not including json eval functions so it's just another one of those things...
It's the Splunk context that's important, though, and a null JSON value would be equivalent to a null/undefined Splunk field.
If one is using Splunk as an immutable persistence layer for serialized JSON objects, then the difference is important, but in that case, you're likely just working with _raw and not using Splunk field extractions or SPL. For the relatively brief time Splunk bundled Node.js, this might have been interesting, but Python is king apparently.
Thanks for the useful information
So, it seems that when using format=json with makeresults you get the true 'Splunk null' which means its truly null. This means when you do:
| makeresults format=json data="[{\"field1\":\"value1\",\"field2\": null,\"field3\":1234, \"field4\":true}]"You actually dont get a field2 - because it is null! This is the same as doing | eval myField=null() - you would not see a field. The reason we see field2 in the output is because for some reason the second part (eval _raw/spath) seems 'null' as a string.
Personally, I think the correct output here is from the makeresults format=json, take a look at this which determines the 'type' of the field:
| makeresults format=json data="[{\"field1\":\"value1\",\"field2\": null,\"field3\":1234, \"field4\":true}]"
| append
[ makeresults
| eval _raw = "{\"field1\":\"value1\",\"field2\":null,\"field3\":4567, \"field4\":true}"
| spath]
| append
[ windbag
| head 1
| fields _raw
| eval _raw = json("{\"field1\":\"value1\",\"field2\":null,\"field3\":7890, \"field4\":true}")
| spath]
| append
[| makeresults
| eval _raw = json("{\"field1\":\"value1\",\"field2\":null,\"field3\":0123, \"field4\":true}")
| fromjson _raw]
| foreach field*
[| eval <<FIELD>>_type=typeof(<<FIELD>>)]When using spath the 'null' is being displayed as as String when it should be Invalid. (Invalid = Not a specific type). I think spath is incorrect here? It should be null/invalid not a string!
Intestingly, fromjson works in the same way as makeresults format=json in that its a true null value, which adds weight to my theory of spath being wrong?!
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing