I am indexing json files. Each file contains an array of around 1,000 json objects (with nested arrays/objects). I need to extract each object as a single event. (See sample json source and props.conf below).
I use the "add data" button on the UI to index the file, it looks like it gets all the events. If I just do a search for all the events, the first json object does show up. However, it looks like the KV_MODE=json stumbles on the initial [ and is unable to extract the fields. Because if I search for one of the fields in the data (index=foo coach="matt"), the event is not returned. However, if I search for just the value of the field *(index=foo matt), the event is returned.
How do I modify my props.conf to correctly handle the first object in the array?
[
{
"team" : "spirit",
"coach": "matt",
"regDate": "2016-07-31T12:23:34Z",
"players": [
{
"name":"Marissa",
"positions": ["2B", "P", "C", "RF"]
},
{
"name":"Sierra",
"positions": ["SS","LF"]
}
]
},
{
"team" : "chill",
"coach": "bob"
"regDate": "2016-08-01T12:15:19Z",
"players": [
{
"name":"Rhi",
"positions": ["3B", "CF","1B"]
}
]
}
]
This is my props.conf:
[json_linebreaker]
JSON_TRIM_BRACES_IN_ARRAY_NAMES=true
KV_MODE=json
LINE_BREAKER=\s{4}\},(,[\n\r])\s{4}\{(.*)
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%dT%H:%M:%S%Z
TIME_PREFIX=regDate\"\s*:\s*\"
Finally got this working by using a PREAMBLE_REGEX to discard the opening array bracket. Posting the props.conf here for completeness (in case someone else has this issue).
[json_linebreaker]
JSON_TRIM_BRACES_IN_ARRAY_NAMES=true
KV_MODE=json
PREAMBLE_REGEX=^\s{0,2}[
LINE_BREAKER=\s{4}},(,[\n\r])\s{4}({.)
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%dT%H:%M:%S%Z
TIME_PREFIX=regDate\"\s:\s*\"
Of course I only have a small set for your data, but this seems to be working. The main challenge is to line break as you mentioned. Assuming that the first element of the json object is always the same ( in your case, it starts with "team", then this regex should work.
LINE_BREAKER = (,*\s+){\s+"team"
Once you have events breaking properly, the only thing you have left is to clean up opening and closing square brackets with SEDCMD. Finished Props looks like this:
[answers]
LINE_BREAKER = (,*\s+){\s+"team"
TIME_PREFIX = regDate":\s"
MAX_TIMESTAMP_LOOKAHEAD = 30
NO_BINARY_CHECK = true
disabled = false
KV_MODE = json
SEDCMD-remove_opening = s/^\[//g
SEDCMD-remove_cloing = s/\]$//g
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true
I had a similar issue, but my json objects was wrapped yet in another json array. Same solution worked there too. As long as you can line break on the first field of the object - you should be fine.
[
"Records": [
{
"team" : "spirit",
"coach": "matt",
"regDate": "2016-07-31T12:23:34Z",
},
{
"team" : "chill",
"coach": "bob"
"regDate": "2016-08-01T12:15:19Z",
}
]
I also spoke with someone from Splunk and they do realize that json array is a common data structure nowadays and they do have an internal Jira task for it as a feature request.
I hope it helps!
Thank you so much. This helped a ton !!
Finally got this working by using a PREAMBLE_REGEX to discard the opening array bracket. Posting the props.conf here for completeness (in case someone else has this issue).
[json_linebreaker]
JSON_TRIM_BRACES_IN_ARRAY_NAMES=true
KV_MODE=json
PREAMBLE_REGEX=^\s{0,2}[
LINE_BREAKER=\s{4}},(,[\n\r])\s{4}({.)
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y-%m-%dT%H:%M:%S%Z
TIME_PREFIX=regDate\"\s:\s*\"
The events are breaking correctly, it's just that pesky initial square bracket. I changed SHOULD_LINEMERGE to false and it didn't seem to change anything.
I've been playing with the regex all day today. The most recent incantation is:
LINE_BREAKER=(^[[\n\r]+)|\s{4}},(,[\n\r])\s{4}{(.*)
My thinking was if I could break the [ into its own event, then I could throw away that event using a transform. However, it is still keeping the [ with the first object and now is splitting the event at random spots.
Are your events breaking correctly? If you have set LINE_BREAKER then SHOULD_LINEMERGE should be set to false, not true. For some reason, setting this through the UI does not work, Splunk just reverts it back to true and adds in a BREAK_ONLY_BEFORE setting as well as the line breaker. This could be causing part of the problem that you are seeing ...