Splunk Search

Avoid multiple spath for a better performant query

monika0511
Explorer

I have a json with the following structure:

 

{
   "version":"v0.2",
   "prints":{
      "urls":[
         {
            "response_time":256,
            "uri":{
               "bool":false,
               "name":"abc"
            },
            "Time":{
               "total":52,
               "db":11
            }
         },
         {
            "response_time":578,
            "uri":{
               "bool":false,
               "name":"xyz"
            },
            "Time":{
               "total":78,
               "db":13
            }
         }
      ]
   }
}

 

I've to create a table with columns : _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time
here is my query that I'm trying to optimize by removing multiple spaths

 

basic search rv=*, av=*, wm=*, an=*, et=*
| spath input=data path=prints.urls{} output=urls
| spath input=urls path=response_time output=response_time
| spath input=urls path=uri.name output=uri_name
| spath input=urls path=Time.db output=db_time
| spath input=urls path=Time.total output=total_time
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

 

I can't help but think there would be a more optimized way to get the table.

Labels (3)
Tags (2)
0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@monika0511 

 

You can try this also.

 

YOUR_SEARCH | spath prints.urls{} output=urls | stats count latest(_time) as Time by urls | rename urls as _raw,Time as _time | extract | rename Time.db as db_time, Time.total as total_time, uri.name as uri_name | table _time uri_name, response_time, db_time, total_time

 

Sample Search:

 

| makeresults 
| eval _raw="{\"version\":\"v0.2\",\"prints\":{\"urls\":[{\"response_time\":256,\"uri\":{\"bool\":false,\"name\":\"abc\"},\"Time\":{\"total\":52,\"db\":11}},{\"response_time\":578,\"uri\":{\"bool\":false,\"name\":\"xyz\"},\"Time\":{\"total\":78,\"db\":13}}]}}" 
| spath prints.urls{} output=urls | stats count latest(_time) as Time by urls | rename urls as _raw,Time as _time | extract | rename Time.db as db_time, Time.total as total_time, uri.name as uri_name | table _time uri_name, response_time, db_time, total_time

 

0 Karma

to4kawa
Ultra Champion

basic search rv=*, av=*, wm=*, an=*, et=*
| spath input=data path=prints.urls{} output=urls

| mvexpand urls 

| spath input=urls

| rename uri.name as uri_name, Time.db as time_db, Time.total as total_time

| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

 

try rename

 

0 Karma

monika0511
Explorer

@to4kawa renaming didn't work 😞 

0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw="{\"version\":\"v0.2\",\"prints\":{\"urls\":[{\"response_time\":256,\"uri\":{\"bool\":false,\"name\":\"abc\"},\"Time\":{\"total\":52,\"db\":11}},{\"response_time\":578,\"uri\":{\"bool\":false,\"name\":\"xyz\"},\"Time\":{\"total\":78,\"db\":13}}]}}" 
| spath prints.urls{} output=urls 
| mvexpand urls 
| spath input=urls 
| rename Time.db as db_time, Time.total as total_time, uri.name as uri_name

@monika0511 
this query works well. 

0 Karma

dmarling
Builder

Beware that mvexpand can really chew through memory on your search head if you have a large amount of events that it's being applied to. This can cause it to terminate early if your admins are enforcing memory limits on searches or can cause the PID to get killed by OOM Killer on a nix box.  If you are okay with a multivalued list type view in your table for each event it's unnecessary to mvexpand.  If you do have a requirement that each row be one url array, then just be careful on the amount of data it is being applied to.

Also even though the same amount of spaths are in my original solution, the performance is still better than invoking the spath search command the same amount of times because it is on a single pipe eval which means it's processed all at once.  Having multiple pipes with spath will cause it to stop and start at each pipe which will add an admittedly negligible performance reduction.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

dmarling
Builder

It appears the board has ate my answer I wrote earlier to prove that multiple eval spaths is faster than using even 1 spath command.  I'll try again.

Using your original query with your sample here is your search performance on my local machine's splunk install:

 

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| spath input=data path=prints.urls{} output=urls
| spath input=urls path=response_time output=response_time
| spath input=urls path=uri.name output=uri_name
| spath input=urls path=Time.db output=db_time
| spath input=urls path=Time.total output=total_time
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

 

This search has completed and has returned 1 results by scanning 0 events in 0.119 seconds

The query I originally proposed:

 

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| eval response_time=spath(data, "prints.urls{}.response_time"),
uri_name=spath(data, "prints.urls{}.uri.name"),
db_time=spath(data, "prints.urls{}.Time.db"),
total_time=spath(data, "prints.urls{}.Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

 

This search has completed and has returned 1 results by scanning 0 events in 0.099 seconds

The search proposed by @to4kawa 

 

| makeresults 
| eval _raw="{\"version\":\"v0.2\",\"prints\":{\"urls\":[{\"response_time\":256,\"uri\":{\"bool\":false,\"name\":\"abc\"},\"Time\":{\"total\":52,\"db\":11}},{\"response_time\":578,\"uri\":{\"bool\":false,\"name\":\"xyz\"},\"Time\":{\"total\":78,\"db\":13}}]}}" 
| spath prints.urls{} output=urls 
| mvexpand urls 
| spath input=urls 
| rename Time.db as db_time, Time.total as total_time, uri.name as uri_name

 

This search has completed and has returned 2 results by scanning 0 events in 0.244 seconds

And if you have a requirement that each url in the url array appears on it's own row in your table here's my modified version with the caveat I posed above about mvexpand being problematic on large data sets:

 

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| eval urls=spath(data,"prints.urls{}")
| mvexpand urls 
| eval response_time=spath(urls, "response_time"),
uri_name=spath(urls, "uri.name"),
db_time=spath(urls, "Time.db"),
total_time=spath(urls, "Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

 

This search has completed and has returned 2 results by scanning 0 events in 0.113 seconds

Ultimately you can see that using a single pipe eval with the spath command on each field you want will produce a more performant query by about 17% to your original query and at 48% improvement compared to the one by @to4kawa.

If this comment/answer was helpful, please up vote it. Thank you.

dmarling
Builder

refer to below response as this was a duplicate of that.

If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

dmarling
Builder

You can use spath in an eval command and you can chain all of the fields into a single eval with a comma separating each field.  This will make it more performant and it removes the need to do multiple spath commands:

basic search rv=*, av=*, wm=*, an=*, et=*
| eval response_time=spath(data, "prints.urls{}.response_time"),
uri_name=spath(data, "prints.urls{}.uri.name"),
db_time=spath(data, "prints.urls{}.Time.db"),
total_time=spath(data, "prints.urls{}.Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

Here's a run anywhere example using your sample data to demonstrate that it will work.  Keep in mind it won' t have the rv, av, wm, an, and et fields due to that not being present in the sample example:

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| eval response_time=spath(data, "prints.urls{}.response_time"),
uri_name=spath(data, "prints.urls{}.uri.name"),
db_time=spath(data, "prints.urls{}.Time.db"),
total_time=spath(data, "prints.urls{}.Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time
If this comment/answer was helpful, please up vote it. Thank you.
0 Karma

monika0511
Explorer

@dmarling it will perform the same as it's using spath equal number of times as the original query I posted. but I agree the 

 

spath(X,Y)

 

syntax is better. Thanks!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...