The Splunk best practices document recommends:
Use clear key-value pairs
key1=value1, key2=value2, key3=value3 . . .
This makes sense for simple data that can be represented in key-value format, but what about nested data structures? For example, what's the best way of representing the following log data using key-value format?
{
"categories": [
"Restaurants",
"American (New)",
"Southern"
],
"attributes": {
"BusinessParking": {
"street": false,
"garage": true
},
"WheelchairAccessible": true,
"GoodForKids": false,
},
"stars": 4.5,
"city": "Las Vegas",
"name": "Yardbird Southern Table & Bar",
}
I can represent the attributes and top level keys using dotted-notation:
attributes.BusinessParking.street="false",
attributes.BusinessParking.garage"true",
attributes.WheelchairAccessible="true",
attributes.GoodForKids"false",
stars="4.5",
city="Las Vegas",
name="Yardbird Southern Table & Bar",
Although I'm not sure if this is optimal.
However, my main question is: how should I represent the categories array?
I need to be able to perform a search on the above data and return all records that have more than N number of categories, so how should my data be structured in order to facilitate such a query in the most efficient way possible?
The reason I'm asking is because we're currently storing our logs in JSON format, and I can indeed perform the above query using JSON data with spath, but there are people in my organization that believe that spath is very slow and using key-value is much faster, and they want to change our logging format from JSON to key-value. I'd like to be able to compare both log structures, JSON and key-value, to understand which format is more efficient for querying (if, in fact there is any difference at all), and at the moment, I can't even figure out how to best structure the key-value logs to allow me to query array data.
... View more