Getting Data In

How to format nested data using key-value structure

adamcohen
New Member

The Splunk best practices document recommends:

Use clear key-value pairs

key1=value1, key2=value2, key3=value3 . . .

This makes sense for simple data that can be represented in key-value format, but what about nested data structures? For example, what's the best way of representing the following log data using key-value format?

{
  "categories": [
    "Restaurants",
    "American (New)",
    "Southern"
  ],
  "attributes": {
    "BusinessParking": {
      "street": false,
      "garage": true
    },
    "WheelchairAccessible": true,
    "GoodForKids": false,
  },
  "stars": 4.5,
  "city": "Las Vegas",
  "name": "Yardbird Southern Table & Bar",
}

I can represent the attributes and top level keys using dotted-notation:

attributes.BusinessParking.street="false",
attributes.BusinessParking.garage"true",
attributes.WheelchairAccessible="true",
attributes.GoodForKids"false",
stars="4.5",
city="Las Vegas",
name="Yardbird Southern Table & Bar",

Although I'm not sure if this is optimal.

However, my main question is: how should I represent the categories array?

I need to be able to perform a search on the above data and return all records that have more than N number of categories, so how should my data be structured in order to facilitate such a query in the most efficient way possible?

The reason I'm asking is because we're currently storing our logs in JSON format, and I can indeed perform the above query using JSON data with spath, but there are people in my organization that believe that spath is very slow and using key-value is much faster, and they want to change our logging format from JSON to key-value. I'd like to be able to compare both log structures, JSON and key-value, to understand which format is more efficient for querying (if, in fact there is any difference at all), and at the moment, I can't even figure out how to best structure the key-value logs to allow me to query array data.

0 Karma

cesarbmx
Engager

@adamcohen - what did you end up doing?
I am in the same situation as you. If Splunk recommends key value pairs (which I also like above json), why doesn't it recommend a way to represent searchable arrays?

0 Karma

starcher
Influencer

If your data is in JSON keep it that way and just put KV_MODE = json on your sourcetype.

0 Karma

adamcohen
New Member

Thanks for the response @starcher, however, I'm not trying to solve this problem for a JSON formatted log - I already know how to do that, and it works well. The problem is how to solve this problem for key-value formatted logs, since my organization wants to have a clear comparison of JSON formatted logs versus key-value. This is why I'm trying to figure out the best way to store a nested data structure in key-value format, so I can attempt to run the same queries against both JSON and key-value formatted data to figure out what the differences are between the two formats, in order to summarise the advantages/disadvantages of both approaches.

For example, say I want to return all restaurants that have more than 15 categories, I can use the following query on JSON formatted data:

source="business.json" | spath categories{} | where mvcount('categories{}') > 15

The above query requires using spath, which can be slow. In order to compare this to key-value, I need to first understand how to store the nested data (including the categories array) in key-value format, so I can then construct a query.

0 Karma
Get Updates on the Splunk Community!

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...

Thank You for Celebrating CX Day with Splunk!

Yesterday the entire team at Splunk + Cisco joined the global celebration of CX Day - celebrating our ...