I have the following data coming into Splunk in JSON format and extracted at index-time:
{
administrativeState: Unlocked
apGroupId: 7954abec-de25-4758-bb5e-0d87be2f0254
approvedTime: 1435500193785
clientCount: 0
configState: newConfig
connectionState: Disconnect
countryCode: XX
cpId: 15e2811e-a24f-4570-bd58-8f18c7e6ed91
description:
dpId:
externalIp: xxx.xx.xx.xx
externalPort: 55540
ip: xxx.xx.xx.xx
ipType: Static
ipv6:
ipv6Type: null
isCriticalAP: false
lastSeenTime: 1454078438291
latitude: null
location: XXXXXXXXX
longitude: null
mac: F0:B0:52:3B:F2:10
meshHop: 0
meshRole: null
model: T300
name: XXXXXXXXX
provisionMethod: Discovered
provisionStage: null
registrationState: Approved
serial: 491484101277
uptime: 81630
version: 3.1.1.0.349
wifi24Channel: 2
wifi50Channel: 104
zoneId: d41a3397-b859-42cd-9332-e308a48069b2
}
Note: The latitude and logitude fields in this case are null
and Splunk's syntax highlighter shows it like null or boolean would be in any editor.
I can search for longitude=null
no problems, however, when it comes to enriching the event with other data from a lookup and trying to coalesce the field with another like so:
eval merged_latitude=coalesce(latitude,zone_lat,0)
Then it appears to be assuming that null
is actually not null and using the null
value rather than attempting to look at the next field or even the fail safe 0
value.
In order to get the null
to be correctly seen as NULL
I have to insert the following into my search:
eval latitude=if(latitude="null",FALSE,latitude)
This seems perhaps the "wrong" way to fix the problem, is there not a better approach to this?
Splunk sees "null" as a valid string value, hence all the issues. (and actually there is no notation that can be used to denote null values other then value not present at all). So to fix this, either you can replace all null with blank (no value) in the raw data before indexing (works only for future data) OR handle the same in search time.
For option 1, see these
https://answers.splunk.com/answers/210096/how-to-configure-sedcmd-in-propsconf.html
http://docs.splunk.com/Documentation/Splunk/6.2.6/Data/Anonymizedatausingconfigurationfiles
For option 2, you already have one option listed. Others options could be like this
your base search | replace "null" with "" in field1, field2, field3
your base search | rex mode=sed "s/null//g"
Splunk sees "null" as a valid string value, hence all the issues. (and actually there is no notation that can be used to denote null values other then value not present at all). So to fix this, either you can replace all null with blank (no value) in the raw data before indexing (works only for future data) OR handle the same in search time.
For option 1, see these
https://answers.splunk.com/answers/210096/how-to-configure-sedcmd-in-propsconf.html
http://docs.splunk.com/Documentation/Splunk/6.2.6/Data/Anonymizedatausingconfigurationfiles
For option 2, you already have one option listed. Others options could be like this
your base search | replace "null" with "" in field1, field2, field3
your base search | rex mode=sed "s/null//g"
Thanks for this answer. I just ran into the same issue (in Splunk Enterprise 6.4).
I am sending JSON-format logs to Splunk via the HTTP Event Collector (EC). Some field values are explicitly set to null
:
some_performance_indicator_field: null
One reason for setting a field value to null
, rather than omitting the field, is to indicate that the version of the performance monitoring system that generated this data does report that performance indicator (from the underlying transaction processing system being monitored), but that, in this instance, there was no value to report. As opposed to a (typically earlier) system that does not report that indicator, and so that field does not occur in its output. A null
value tells you something about the performance indicator, whereas, if the field is absent, the performance indicator might have had any value; it just wasn't reported.
So (with apologies to readers who are well aware of this), there can be a significant semantic difference in JSON between a field being set to the explicit value null
and that field being omitted.
It appears (and my own ad hoc testing confirms this answer) that Splunk interprets the JSON null
field value as the string value "null"
.
That behavior is consistent with the following definition in the Splunk documentation topic "The search processing language syntax":
Null field
A null field is not present on a particular result or event.
(Although I would prefer it if the documentation explicitly covered the case of the JSON null
value being interpreted as the string value "null"
, and any similar issues with other input data formats. Perhaps it does; just not in that topic.)
I presume (I haven't tested this) that this means that, for the purposes of searching in Splunk, the following JSON KV pair:
myfield: null
is identical to:
myfield: "null"
This is still an issue for me in Splunk 7.3.0.
Given the following input JSON Lines:
{"code":"test_null_value","time":"2019-11-07T08:00:00+08:00","test":null}
{"code":"test_null_value","time":"2019-11-07T08:00:01+08:00","test":"null"}
{"code":"test_null_value","time":"2019-11-07T08:00:02+08:00","test":"present"}
(For the purposes of this comment, you can ignore the code
and time
fields.)
the following search:
sourcetype="test_null_value" test="null"
returns two results: the event with "null"
and the event with null
.
Curiously, Splunk Web syntax-highlights the null
in "test":null
in orange, whereas the string value null
in "test":"null"
is green.
The following search:
sourcetype="test_null_value" | where isnull(test)
returns no results, as expected: the (JavaScript) value null
does not correspond to Splunk's "not present" concept of NULL.
I don't know of any way in a Splunk search to distinguish between the string value "null"
and the value null
. Splunk seems to treat them as identical string values. It seems that orange syntax highlighting in the Search app is just for show; skin deep.
You can work around the "null" vs. null issue in some cases. You can use spath to get a bit more specific with nested JSON, but it isn't as simple as simple search criteria.
index=_internal
| head 1
| fields _time, _raw
| eval _raw = "{\"nullable_field\": null, \"nullable_list_0\": [ null ], \"nullable_list_1\": [ \"null\", null ] }"
| spath
| regex "\"nullable_field\"\s*:\s*null\b" `comment("single values are easier to identify than lists; could be converted to a macro")`
| eval includes_null_as_string = if(searchmatch("\"null\""), "true", "false") `comment("events with string that is not null")`
Thanks - at least your suggestions are much more efficient than mine
So it turns out that the first command the value is still treated as non NULL 😞 and the rex command doesn't even appear to be doing anything at all 😞