<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Search query using spath and mvexpand on multi-value nested JSON doesn't scale in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326818#M97375</link>
    <description>&lt;P&gt;Hi Splunk Experts,&lt;/P&gt;

&lt;P&gt;I am sending events to Splunk Enterprise in the following nested JSON format:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{
     compliance:     Compliance Unknown,    
     ctupdate:   hostinfo,  
     host_properties:   [
            {
             name:   _times,    
              since:     1508907165,    
             value:  last_onsite    
            },  
            {
              name:  compliance_state,  
               since:    1508268020,    
               value:    N/A    
            },  
            {
              name:  engine_seen_packet,    
               since:    1508907165,    
               value:    Yes    
            },  
            {
               name:     guest_corporate_state, 
               since:    1508268020,    
               value:    N/A    
            },  
            {
             name:   linux_operating_system,    
             since:  1508907165,    
             value:  2.0.2  
            },  
            {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue1
            },  
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue2
            },
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue3
            },
            {
             name:   ssh_open_port, 
             since:  1508959259,    
             value:  0  
            },
            {
             name:   va_netfunc,    
             since:  1508959247,    
             value:  Linux Desktop/Server   
            }   
    ]   
     ip:     192.168.1.17,  
     tenant_id:  acd1034578ef
}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I want to create a dashboard that would go over all such events and display a pie-chart with online values seen so far. To achieve this, I've been using the following query:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`ct_hostinfo` `get_sourcetypes`
| spath output=prop_name path=host_properties{}.name
| spath output=prop_val path=host_properties{}.value
| eval prop_key_val=mvzip(prop_name, prop_val, "---")
| mvexpand prop_key_val
| eval prop_key_val=split(prop_key_val, "---")
| eval prop_name=mvindex(prop_key_val, 0)
| eval prop_val=mvindex(prop_key_val, 1)
| search prop_name=online
| stats count by prop_val
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The above query does its work but &lt;STRONG&gt;it doesn't scale.&lt;/STRONG&gt; If I let my dashboard collect data for over ~5k events, I start seeing the following error:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;command.mvexpand: output will be truncated at 1300 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I would like to know if I can somehow scale the above search query so that it can handle greater number of events. Would I need to defined field extractions etc. in transforms.conf/props.conf? I don't know much about field extraction configs.&lt;/P&gt;

&lt;P&gt;I would be very grateful, if someone can suggest me a better query or field extractions.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
    <pubDate>Thu, 26 Oct 2017 02:23:26 GMT</pubDate>
    <dc:creator>sharad06</dc:creator>
    <dc:date>2017-10-26T02:23:26Z</dc:date>
    <item>
      <title>Search query using spath and mvexpand on multi-value nested JSON doesn't scale</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326818#M97375</link>
      <description>&lt;P&gt;Hi Splunk Experts,&lt;/P&gt;

&lt;P&gt;I am sending events to Splunk Enterprise in the following nested JSON format:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{
     compliance:     Compliance Unknown,    
     ctupdate:   hostinfo,  
     host_properties:   [
            {
             name:   _times,    
              since:     1508907165,    
             value:  last_onsite    
            },  
            {
              name:  compliance_state,  
               since:    1508268020,    
               value:    N/A    
            },  
            {
              name:  engine_seen_packet,    
               since:    1508907165,    
               value:    Yes    
            },  
            {
               name:     guest_corporate_state, 
               since:    1508268020,    
               value:    N/A    
            },  
            {
             name:   linux_operating_system,    
             since:  1508907165,    
             value:  2.0.2  
            },  
            {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue1
            },  
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue2
            },
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue3
            },
            {
             name:   ssh_open_port, 
             since:  1508959259,    
             value:  0  
            },
            {
             name:   va_netfunc,    
             since:  1508959247,    
             value:  Linux Desktop/Server   
            }   
    ]   
     ip:     192.168.1.17,  
     tenant_id:  acd1034578ef
}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I want to create a dashboard that would go over all such events and display a pie-chart with online values seen so far. To achieve this, I've been using the following query:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`ct_hostinfo` `get_sourcetypes`
| spath output=prop_name path=host_properties{}.name
| spath output=prop_val path=host_properties{}.value
| eval prop_key_val=mvzip(prop_name, prop_val, "---")
| mvexpand prop_key_val
| eval prop_key_val=split(prop_key_val, "---")
| eval prop_name=mvindex(prop_key_val, 0)
| eval prop_val=mvindex(prop_key_val, 1)
| search prop_name=online
| stats count by prop_val
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The above query does its work but &lt;STRONG&gt;it doesn't scale.&lt;/STRONG&gt; If I let my dashboard collect data for over ~5k events, I start seeing the following error:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;command.mvexpand: output will be truncated at 1300 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I would like to know if I can somehow scale the above search query so that it can handle greater number of events. Would I need to defined field extractions etc. in transforms.conf/props.conf? I don't know much about field extraction configs.&lt;/P&gt;

&lt;P&gt;I would be very grateful, if someone can suggest me a better query or field extractions.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2017 02:23:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326818#M97375</guid>
      <dc:creator>sharad06</dc:creator>
      <dc:date>2017-10-26T02:23:26Z</dc:date>
    </item>
    <item>
      <title>Re: Search query using spath and mvexpand on multi-value nested JSON doesn't scale</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326819#M97376</link>
      <description>&lt;P&gt;You are carrying along a ton of fields when you &lt;CODE&gt;mvexpand&lt;/CODE&gt; your records.  That calculates to over 300K per record.  You only need &lt;CODE&gt;prop_key_val&lt;/CODE&gt; at that point, right?&lt;/P&gt;

&lt;P&gt;Two &lt;CODE&gt;fields&lt;/CODE&gt; commands, immediately before the &lt;CODE&gt;mvexpand&lt;/CODE&gt;, should put you up to millions of records possible.  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | eval prop_key_val=mvzip(prop_name, prop_val, "---")
 | fields prop_key_val
 | fields - _*
 | mvexpand prop_key_val
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;Looking deeper, you only want ONE of the values for &lt;CODE&gt;prop_name=online&lt;/CODE&gt;, so you don't need the &lt;CODE&gt;mvexpand&lt;/CODE&gt; at all.  You need &lt;CODE&gt;mvfilter&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;Try this against your existing search for a small time number of records.  It should get the same results.  If not, then adjust the &lt;CODE&gt;mvfilter&lt;/CODE&gt; test until it does.  (Adjust case sensitivity, &lt;CODE&gt;like()&lt;/CODE&gt; vs &lt;CODE&gt;match()&lt;/CODE&gt;, and so on)&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;`ct_hostinfo` `get_sourcetypes`
 | spath output=prop_name path=host_properties{}.name
 | spath output=prop_val path=host_properties{}.value
 | eval prop_key_val=mvzip(prop_name, prop_val, "---")
 | eval prop_key_val=mvfilter( like(prop_key_val,"online%") 
 | mvexpand prop_key_val
 | eval prop_val=mvindex(split(prop_key_val, "---"), 1)
 | stats count by prop_val
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 30 Oct 2017 01:51:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326819#M97376</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-10-30T01:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Search query using spath and mvexpand on multi-value nested JSON doesn't scale</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326820#M97377</link>
      <description>&lt;P&gt;Hi Dal,&lt;/P&gt;

&lt;P&gt;I tried your suggestion to remove internal fields (fields - _*). But it doesn't work &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;It's always returning 'No results found'.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Nov 2017 20:05:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Search-query-using-spath-and-mvexpand-on-multi-value-nested-JSON/m-p/326820#M97377</guid>
      <dc:creator>sharad06</dc:creator>
      <dc:date>2017-11-07T20:05:39Z</dc:date>
    </item>
  </channel>
</rss>

