Getting Data In

What is the best way to handle json data with nested arrays?

lyndac
Contributor

I am having some trouble working with JSON events. I use Splunk Enterprise 6.4.1. I'm using KV_MODE=json in my props.conf file. For regular fields and top level arrays, it's working great. However I have an array of objects (which contain arrays...see example data below). It looks like by using KV_MODE=json, I lose the relationship between the objects/fields.

For example, using the event below, I can tell how many players the team has at each position, but I can't tell how many players of each position are at a given school. My search is:
index=foo sourcetype=json_linebreaker players{}.school="resevoir" | stats count by players{}.positions

If even just one player on a team has school = resevoir, then ALL the players from that team are included in the count, so the results (assuming this is the only team) are:

2B 2
P 2
C 2
RF 2
SS 1
LF 1
3B 1
CF 1
1B 1

The results I want is to only see the positions of the players who go to resevoir:

3B 1
CF 1
1B 1
P 1
C 1

I've tried using spath (even tho the docs say that if you use KV_MODE=json you don't need spath)
index=foo sourcetype=json_linebreaker players{}.school="resevoir" | spath players{} output=x | fields - _raw | fields x | mvexpand x | spath input=x | search school="resevoir" | stats count by positions{}

This should return the results that I'm looking for, but for some events (say 10%), the first spath command generates an empty field named x. When I look at the raw event, there is a players{} array, but for some reason the fields within it are not extracted.

So, what is the best way to index/search this data? I get an average of 7 million events a day and have about 50 users currently. (yes the sample data is made up, but it has the same structure as the data I am actually ingesting). I'd like to make it as easy as possible for the users to write searches, but it also needs to be as performant as possible.

props.conf

[json_linebreaker]
JSON_TRIM_BRACES_IN_ARRAY_NAMES=true
KV_MODE=json
LINE_BREAKER=\s{4}\},(,[\n\r])\s{4}\{(.*)
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%dT%H:%M:%S%Z
TIME_PREFIX=regDate\"\x*:\s*\"

Sample JSON input:

{
  "team" : "spirit",
  "coach": "matt",
  "division": "14U",
  "tournaments": ["Save Second Base", "Autism", "Fear the Fastpitch"],
  "teamType": "fastpitch",
  "assistants": ["Jay", "Dwayne", "Bob", "Alyssa"],
  "players": [
    {
      "name":"Marissa",
      "number": 27,
      "positions": ["2B", "P", "C", "RF"],
      "lineup_spot":2,
      "pitches": ["fastball", "changeup", "curve", "drop", "rise"],
      "school": "Chesapeake Middle"
    },
    {
      "name":"Sierra",
      "number": 5,
      "positions": ["SS","LF"],
      "school": "Elkridge",
      "lineup_spot":7,
    },
    {
      "name":"Rhi",
      "number": 12,
      "positions": ["3B", "CF","1B"],
      "lineup_spot":1,
      "school":"resevoir"
    },
    {
      "name":"Kasey",
      "number": 10,
      "positions": ["P", "C"],
      "pitches": ["fastball", "changeup"],
      "lineup_spot":9,
      "school": "resevoir"
    },
    {
      "name":"Stella",
      "number": 46,
      "positions": ["RF", "2B"],
      "lineup_spot":10,
      "school":"bollman",
      "lineup_spot":10,
    }
  ]
}
0 Karma
1 Solution

lyndac
Contributor

I answered my own question! Looks like the issue is in the json that I'm feeding into splunk. Sometimes in the array of players, the last field in the object has a trailing comma.

{
       "name":"Stella",
       "number": 46,
       "positions": ["RF", "2B"],
       "lineup_spot":10,
       "school":"bollman",
       "lineup_spot":10,
     }

if I remove that comma and re-index, everything works great. Including the spath command that was previously not returning results. It seems that just one object in the array with an erroneous comma will cause splunk to ignore all the array elements. Thankfully, I can control the json that is input (sorta), so I can fix it rather easily.

View solution in original post

lyndac
Contributor

I answered my own question! Looks like the issue is in the json that I'm feeding into splunk. Sometimes in the array of players, the last field in the object has a trailing comma.

{
       "name":"Stella",
       "number": 46,
       "positions": ["RF", "2B"],
       "lineup_spot":10,
       "school":"bollman",
       "lineup_spot":10,
     }

if I remove that comma and re-index, everything works great. Including the spath command that was previously not returning results. It seems that just one object in the array with an erroneous comma will cause splunk to ignore all the array elements. Thankfully, I can control the json that is input (sorta), so I can fix it rather easily.

Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...