Getting Data In

How to format nested data using key-value structure

New Member

The Splunk best practices document recommends:

Use clear key-value pairs

key1=value1, key2=value2, key3=value3 . . .

This makes sense for simple data that can be represented in key-value format, but what about nested data structures? For example, what's the best way of representing the following log data using key-value format?

{
  "categories": [
    "Restaurants",
    "American (New)",
    "Southern"
  ],
  "attributes": {
    "BusinessParking": {
      "street": false,
      "garage": true
    },
    "WheelchairAccessible": true,
    "GoodForKids": false,
  },
  "stars": 4.5,
  "city": "Las Vegas",
  "name": "Yardbird Southern Table & Bar",
}

I can represent the attributes and top level keys using dotted-notation:

attributes.BusinessParking.street="false",
attributes.BusinessParking.garage"true",
attributes.WheelchairAccessible="true",
attributes.GoodForKids"false",
stars="4.5",
city="Las Vegas",
name="Yardbird Southern Table & Bar",

Although I'm not sure if this is optimal.

However, my main question is: how should I represent the categories array?

I need to be able to perform a search on the above data and return all records that have more than N number of categories, so how should my data be structured in order to facilitate such a query in the most efficient way possible?

The reason I'm asking is because we're currently storing our logs in JSON format, and I can indeed perform the above query using JSON data with spath, but there are people in my organization that believe that spath is very slow and using key-value is much faster, and they want to change our logging format from JSON to key-value. I'd like to be able to compare both log structures, JSON and key-value, to understand which format is more efficient for querying (if, in fact there is any difference at all), and at the moment, I can't even figure out how to best structure the key-value logs to allow me to query array data.

0 Karma

New Member

@adamcohen - what did you end up doing?
I am in the same situation as you. If Splunk recommends key value pairs (which I also like above json), why doesn't it recommend a way to represent searchable arrays?

0 Karma

SplunkTrust
SplunkTrust

If your data is in JSON keep it that way and just put KV_MODE = json on your sourcetype.

0 Karma

New Member

Thanks for the response @starcher, however, I'm not trying to solve this problem for a JSON formatted log - I already know how to do that, and it works well. The problem is how to solve this problem for key-value formatted logs, since my organization wants to have a clear comparison of JSON formatted logs versus key-value. This is why I'm trying to figure out the best way to store a nested data structure in key-value format, so I can attempt to run the same queries against both JSON and key-value formatted data to figure out what the differences are between the two formats, in order to summarise the advantages/disadvantages of both approaches.

For example, say I want to return all restaurants that have more than 15 categories, I can use the following query on JSON formatted data:

source="business.json" | spath categories{} | where mvcount('categories{}') > 15

The above query requires using spath, which can be slow. In order to compare this to key-value, I need to first understand how to store the nested data (including the categories array) in key-value format, so I can then construct a query.

0 Karma