Splunk Search
Highlighted

Need help with linebreaker for array of json objects

Contributor

I am indexing json files. Each file contains an array of around 1,000 json objects (with nested arrays/objects). I need to extract each object as a single event. (See sample json source and props.conf below).

I use the "add data" button on the UI to index the file, it looks like it gets all the events. If I just do a search for all the events, the first json object does show up. However, it looks like the KV_MODE=json stumbles on the initial [ and is unable to extract the fields. Because if I search for one of the fields in the data (index=foo coach="matt"), the event is not returned. However, if I search for just the value of the field *(index=foo matt), the event is returned.

How do I modify my props.conf to correctly handle the first object in the array?

[
    {    
        "team" : "spirit",        
        "coach": "matt",
        "regDate": "2016-07-31T12:23:34Z",
        "players": [
          {
            "name":"Marissa",
            "positions": ["2B", "P", "C", "RF"]
          },
          {
            "name":"Sierra",
            "positions": ["SS","LF"]
          }
        ]
    },
    {    
        "team" : "chill",        
        "coach": "bob"
        "regDate": "2016-08-01T12:15:19Z",
        "players": [
          {
            "name":"Rhi",
            "positions": ["3B", "CF","1B"]
          }
        ]
    }
]

This is my props.conf:

 [json_linebreaker]
 JSON_TRIM_BRACES_IN_ARRAY_NAMES=true
 KV_MODE=json
 LINE_BREAKER=\s{4}\},(,[\n\r])\s{4}\{(.*)
 MAX_TIMESTAMP_LOOKAHEAD=30
 NO_BINARY_CHECK=true
 SHOULD_LINEMERGE=true
 TIME_FORMAT=%Y-%m-%dT%H:%M:%S%Z
 TIME_PREFIX=regDate\"\s*:\s*\"
0 Karma
Highlighted

Re: Need help with linebreaker for array of json objects

Contributor

Are your events breaking correctly? If you have set LINEBREAKER then SHOULDLINEMERGE should be set to false, not true. For some reason, setting this through the UI does not work, Splunk just reverts it back to true and adds in a BREAKONLYBEFORE setting as well as the line breaker. This could be causing part of the problem that you are seeing ...

0 Karma
Highlighted

Re: Need help with linebreaker for array of json objects

Contributor

The events are breaking correctly, it's just that pesky initial square bracket. I changed SHOULD_LINEMERGE to false and it didn't seem to change anything.

0 Karma
Highlighted

Re: Need help with linebreaker for array of json objects

Contributor

I've been playing with the regex all day today. The most recent incantation is:

LINE_BREAKER=(^[[\n\r]+)|\s{4}},(,[\n\r])\s{4}{(.*)

My thinking was if I could break the [ into its own event, then I could throw away that event using a transform. However, it is still keeping the [ with the first object and now is splitting the event at random spots.

0 Karma
Highlighted

Re: Need help with linebreaker for array of json objects

Contributor

Finally got this working by using a PREAMBLE_REGEX to discard the opening array bracket. Posting the props.conf here for completeness (in case someone else has this issue).

[jsonlinebreaker]
JSON
TRIMBRACESINARRAYNAMES=true
KVMODE=json
PREAMBLE
REGEX=^\s{0,2}[
LINEBREAKER=\s{4}},(,[\n\r])\s{4}({.*)
MAX
TIMESTAMPLOOKAHEAD=30
NO
BINARYCHECK=true
SHOULD
LINEMERGE=false
TIMEFORMAT=%Y-%m-%dT%H:%M:%S%Z
TIME
PREFIX=regDate\"\s:\s\"

View solution in original post

0 Karma
Highlighted

Re: Need help with linebreaker for array of json objects

Path Finder

Of course I only have a small set for your data, but this seems to be working. The main challenge is to line break as you mentioned. Assuming that the first element of the json object is always the same ( in your case, it starts with "team", then this regex should work.

LINE_BREAKER = (,*\s+){\s+"team"

Once you have events breaking properly, the only thing you have left is to clean up opening and closing square brackets with SEDCMD. Finished Props looks like this:

[answers]
LINE_BREAKER = (,*\s+){\s+"team"
TIME_PREFIX = regDate":\s"
MAX_TIMESTAMP_LOOKAHEAD = 30
NO_BINARY_CHECK = true
disabled = false
KV_MODE = json
SEDCMD-remove_opening = s/^\[//g
SEDCMD-remove_cloing = s/\]$//g
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true

I had a similar issue, but my json objects was wrapped yet in another json array. Same solution worked there too. As long as you can line break on the first field of the object - you should be fine.

   [
  "Records": [
    {
        "team" : "spirit",
        "coach": "matt",
        "regDate": "2016-07-31T12:23:34Z",
    },
    {
        "team" : "chill",
        "coach": "bob"
        "regDate": "2016-08-01T12:15:19Z",
    }
]

I also spoke with someone from Splunk and they do realize that json array is a common data structure nowadays and they do have an internal Jira task for it as a feature request.

I hope it helps!

Highlighted

Re: Need help with linebreaker for array of json objects

Splunk Employee
Splunk Employee

Thank you so much. This helped a ton !!

0 Karma