Splunk Search

Search query using spath and mvexpand on multi-value nested JSON doesn't scale

sharad06
Explorer

Hi Splunk Experts,

I am sending events to Splunk Enterprise in the following nested JSON format:

{
     compliance:     Compliance Unknown,    
     ctupdate:   hostinfo,  
     host_properties:   [
            {
             name:   _times,    
              since:     1508907165,    
             value:  last_onsite    
            },  
            {
              name:  compliance_state,  
               since:    1508268020,    
               value:    N/A    
            },  
            {
              name:  engine_seen_packet,    
               since:    1508907165,    
               value:    Yes    
            },  
            {
               name:     guest_corporate_state, 
               since:    1508268020,    
               value:    N/A    
            },  
            {
             name:   linux_operating_system,    
             since:  1508907165,    
             value:  2.0.2  
            },  
            {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue1
            },  
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue2
            },
                {
             name:   online,    
             since:  1508907165,    
             value:  online_multivalue3
            },
            {
             name:   ssh_open_port, 
             since:  1508959259,    
             value:  0  
            },
            {
             name:   va_netfunc,    
             since:  1508959247,    
             value:  Linux Desktop/Server   
            }   
    ]   
     ip:     192.168.1.17,  
     tenant_id:  acd1034578ef
}

I want to create a dashboard that would go over all such events and display a pie-chart with online values seen so far. To achieve this, I've been using the following query:

`ct_hostinfo` `get_sourcetypes`
| spath output=prop_name path=host_properties{}.name
| spath output=prop_val path=host_properties{}.value
| eval prop_key_val=mvzip(prop_name, prop_val, "---")
| mvexpand prop_key_val
| eval prop_key_val=split(prop_key_val, "---")
| eval prop_name=mvindex(prop_key_val, 0)
| eval prop_val=mvindex(prop_key_val, 1)
| search prop_name=online
| stats count by prop_val

The above query does its work but it doesn't scale. If I let my dashboard collect data for over ~5k events, I start seeing the following error:

command.mvexpand: output will be truncated at 1300 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.

I would like to know if I can somehow scale the above search query so that it can handle greater number of events. Would I need to defined field extractions etc. in transforms.conf/props.conf? I don't know much about field extraction configs.

I would be very grateful, if someone can suggest me a better query or field extractions.

Thanks.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

You are carrying along a ton of fields when you mvexpand your records. That calculates to over 300K per record. You only need prop_key_val at that point, right?

Two fields commands, immediately before the mvexpand, should put you up to millions of records possible.

 | eval prop_key_val=mvzip(prop_name, prop_val, "---")
 | fields prop_key_val
 | fields - _*
 | mvexpand prop_key_val

Looking deeper, you only want ONE of the values for prop_name=online, so you don't need the mvexpand at all. You need mvfilter.

Try this against your existing search for a small time number of records. It should get the same results. If not, then adjust the mvfilter test until it does. (Adjust case sensitivity, like() vs match(), and so on)

`ct_hostinfo` `get_sourcetypes`
 | spath output=prop_name path=host_properties{}.name
 | spath output=prop_val path=host_properties{}.value
 | eval prop_key_val=mvzip(prop_name, prop_val, "---")
 | eval prop_key_val=mvfilter( like(prop_key_val,"online%") 
 | mvexpand prop_key_val
 | eval prop_val=mvindex(split(prop_key_val, "---"), 1)
 | stats count by prop_val

sharad06
Explorer

Hi Dal,

I tried your suggestion to remove internal fields (fields - _*). But it doesn't work 😞

It's always returning 'No results found'.

0 Karma
Get Updates on the Splunk Community!

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...