<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to format nested data using key-value structure in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293214#M55819</link>
    <description>&lt;P&gt;Thanks for the response @starcher, however, I'm not trying to solve this problem for a JSON formatted log - I already know how to do that, and it works well. The problem is how to solve this problem for key-value formatted logs, since my organization wants to have a clear comparison of JSON formatted logs versus key-value.  This is why I'm trying to figure out the best way to store a nested data structure in key-value format, so I can attempt to run the same queries against both JSON and key-value formatted data to figure out what the differences are between the two formats, in order to summarise the advantages/disadvantages of both approaches.&lt;/P&gt;

&lt;P&gt;For example, say I want to return all restaurants that have more than 15 categories, I can use the following query on JSON formatted data:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source="business.json" | spath categories{} | where mvcount('categories{}') &amp;gt; 15
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The above query requires using spath, which can be slow. In order to compare this to key-value, I need to first understand how to store the nested data (including the categories array) in key-value format, so I can then construct a query.&lt;/P&gt;</description>
    <pubDate>Mon, 02 Apr 2018 23:25:05 GMT</pubDate>
    <dc:creator>adamcohen</dc:creator>
    <dc:date>2018-04-02T23:25:05Z</dc:date>
    <item>
      <title>How to format nested data using key-value structure</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293212#M55817</link>
      <description>&lt;P&gt;The Splunk best practices document recommends:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;Use clear key-value pairs

key1=value1, key2=value2, key3=value3 . . .
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This makes sense for simple data that can be represented in key-value format, but what about nested data structures?  For example, what's the best way of representing the following log data using key-value format?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{
  "categories": [
    "Restaurants",
    "American (New)",
    "Southern"
  ],
  "attributes": {
    "BusinessParking": {
      "street": false,
      "garage": true
    },
    "WheelchairAccessible": true,
    "GoodForKids": false,
  },
  "stars": 4.5,
  "city": "Las Vegas",
  "name": "Yardbird Southern Table &amp;amp; Bar",
}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I can represent the attributes and top level keys using dotted-notation:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;attributes.BusinessParking.street="false",
attributes.BusinessParking.garage"true",
attributes.WheelchairAccessible="true",
attributes.GoodForKids"false",
stars="4.5",
city="Las Vegas",
name="Yardbird Southern Table &amp;amp; Bar",
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Although I'm not sure if this is optimal.  &lt;/P&gt;

&lt;P&gt;However, my main question is: &lt;STRONG&gt;how should I represent the categories array?&lt;/STRONG&gt;  &lt;/P&gt;

&lt;P&gt;I need to be able to perform a search on the above data and return all records that have more than N number of categories, so how should my data be structured in order to facilitate such a query in the most efficient way possible?&lt;/P&gt;

&lt;P&gt;The reason I'm asking is because we're currently storing our logs in JSON format, and I can indeed perform the above query using JSON data with spath, but there are people in my organization that believe that spath is very slow and using key-value is much faster, and they want to change our logging format from JSON to key-value.  I'd like to be able to compare both log structures, JSON and key-value, to understand which format is more efficient for querying (if, in fact there is any difference at all), and at the moment, I can't even figure out how to best structure the key-value logs to allow me to query array data.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 06:39:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293212#M55817</guid>
      <dc:creator>adamcohen</dc:creator>
      <dc:date>2018-03-27T06:39:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to format nested data using key-value structure</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293213#M55818</link>
      <description>&lt;P&gt;If your data is in JSON keep it that way and just put KV_MODE = json on your sourcetype. &lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 23:12:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293213#M55818</guid>
      <dc:creator>starcher</dc:creator>
      <dc:date>2018-03-28T23:12:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to format nested data using key-value structure</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293214#M55819</link>
      <description>&lt;P&gt;Thanks for the response @starcher, however, I'm not trying to solve this problem for a JSON formatted log - I already know how to do that, and it works well. The problem is how to solve this problem for key-value formatted logs, since my organization wants to have a clear comparison of JSON formatted logs versus key-value.  This is why I'm trying to figure out the best way to store a nested data structure in key-value format, so I can attempt to run the same queries against both JSON and key-value formatted data to figure out what the differences are between the two formats, in order to summarise the advantages/disadvantages of both approaches.&lt;/P&gt;

&lt;P&gt;For example, say I want to return all restaurants that have more than 15 categories, I can use the following query on JSON formatted data:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;source="business.json" | spath categories{} | where mvcount('categories{}') &amp;gt; 15
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The above query requires using spath, which can be slow. In order to compare this to key-value, I need to first understand how to store the nested data (including the categories array) in key-value format, so I can then construct a query.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2018 23:25:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293214#M55819</guid>
      <dc:creator>adamcohen</dc:creator>
      <dc:date>2018-04-02T23:25:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to format nested data using key-value structure</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293215#M55820</link>
      <description>&lt;P&gt;@adamcohen - what did you end up doing? &lt;BR /&gt;
I am in the same situation as you. If Splunk recommends key value pairs (which I also like above json), why doesn't it recommend a way to represent searchable arrays?&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jan 2019 23:24:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-format-nested-data-using-key-value-structure/m-p/293215#M55820</guid>
      <dc:creator>cesarbmx</dc:creator>
      <dc:date>2019-01-09T23:24:12Z</dc:date>
    </item>
  </channel>
</rss>

