Splunk Search

How do I get Splunk to extract nested JSON arrays properly?

lyndac
Contributor

I am indexing JSON data. I need to be able to do stats based "by patches" and "by admin". I can't get spath or mvexpand to extract the nested arrays properly. Can anyone help me to figure this out?

My props.conf looks like this:

[ my_json ]
INDEXED_EXTRACTIONS = json
KV_MODE=none
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=true
TIMESTAMP_FIELDS=upTime
TIME_FORMAT=%Y-%m-%d %H:%M:%S

The data is similar to this:

{
  "upTime": "2015-02-08 16:43:23",
  "type": "thetype",
   "id": "123454829",
   "systems": [
      {
          "hostname": "host1",
          "admins": [ "jdoe","lcod", "pamtie"],
          "os": "linux",
          "patches": ["1.2", "2.3", "4.3", "5.4"]
      },
      {
          "hostname": "host2",
          "admins": [ "barry", "patty"],
          "os": "linux",
          "patches": ["2.3", "5.4"]
      }
  ]
}
1 Solution

lyndac
Contributor

Thanks for your help. I was finally able to accomplish this using spath & mvexpand. In case someone else needs this in the future, my search is now:

index=foo | spath path=systems{} output=x | fields - _raw | fields upTime, type, id, x | mvexpand x | spath input=x | rename admins{} as admins | mvexpand admins | stats count as Count by type, admins

View solution in original post

lyndac
Contributor

Thanks for your help. I was finally able to accomplish this using spath & mvexpand. In case someone else needs this in the future, my search is now:

index=foo | spath path=systems{} output=x | fields - _raw | fields upTime, type, id, x | mvexpand x | spath input=x | rename admins{} as admins | mvexpand admins | stats count as Count by type, admins

somesoni2
SplunkTrust
SplunkTrust

Your sample data doesn't look like in proper json format (some keys are not in double quotes) and that should be the reason spath (in search) and INDEXED_EXTRACTIONS = json in props.conf is not recognizing array. Check the syntax here http://pro.jsonlint.com/

0 Karma

bmacias84
Champion

This kind of data is a pain to work with because it requires the uses of mv commands. to extract what you want you need first zip the data you want to pull out.

…| rename systems{}.hostname as hostname, rename systems{}.patches as patches | eval a mvzip(hostname,patches, "|") | fields _time a | mvexpand a | rex field=a "(?<hostname>[^\|]+)\|(?<patches>[^\|]+)"

If you need to expand patches just append mvexpand patches to the end.

I use this method to to extract multilevel deep fields with multiple values. It does a the limitation of only able to extract two multi valued fields from the data and get very slow with large data sets.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...