Avoid multiple spath for a better performant query

monika0511 · ‎06-16-2020

I have a json with the following structure:

{
   "version":"v0.2",
   "prints":{
      "urls":[
         {
            "response_time":256,
            "uri":{
               "bool":false,
               "name":"abc"
            },
            "Time":{
               "total":52,
               "db":11
            }
         },
         {
            "response_time":578,
            "uri":{
               "bool":false,
               "name":"xyz"
            },
            "Time":{
               "total":78,
               "db":13
            }
         }
      ]
   }
}

I've to create a table with columns : _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time
here is my query that I'm trying to optimize by removing multiple spaths

basic search rv=*, av=*, wm=*, an=*, et=*
| spath input=data path=prints.urls{} output=urls
| spath input=urls path=response_time output=response_time
| spath input=urls path=uri.name output=uri_name
| spath input=urls path=Time.db output=db_time
| spath input=urls path=Time.total output=total_time
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

I can't help but think there would be a more optimized way to get the table.

kamlesh_vaghela · ‎06-17-2020

@monika0511

You can try this also.

YOUR_SEARCH | spath prints.urls{} output=urls | stats count latest(_time) as Time by urls | rename urls as _raw,Time as _time | extract | rename Time.db as db_time, Time.total as total_time, uri.name as uri_name | table _time uri_name, response_time, db_time, total_time

Sample Search:

| makeresults 
| eval _raw="{\"version\":\"v0.2\",\"prints\":{\"urls\":[{\"response_time\":256,\"uri\":{\"bool\":false,\"name\":\"abc\"},\"Time\":{\"total\":52,\"db\":11}},{\"response_time\":578,\"uri\":{\"bool\":false,\"name\":\"xyz\"},\"Time\":{\"total\":78,\"db\":13}}]}}" 
| spath prints.urls{} output=urls | stats count latest(_time) as Time by urls | rename urls as _raw,Time as _time | extract | rename Time.db as db_time, Time.total as total_time, uri.name as uri_name | table _time uri_name, response_time, db_time, total_time

to4kawa · ‎06-16-2020

basic search rv=*, av=*, wm=*, an=*, et=*
| spath input=data path=prints.urls{} output=urls

| mvexpand urls

| spath input=urls

| rename uri.name as uri_name, Time.db as time_db, Time.total as total_time

| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

try rename

monika0511 · ‎06-16-2020

@to4kawa renaming didn't work 😞

to4kawa · ‎06-17-2020

| makeresults 
| eval _raw="{\"version\":\"v0.2\",\"prints\":{\"urls\":[{\"response_time\":256,\"uri\":{\"bool\":false,\"name\":\"abc\"},\"Time\":{\"total\":52,\"db\":11}},{\"response_time\":578,\"uri\":{\"bool\":false,\"name\":\"xyz\"},\"Time\":{\"total\":78,\"db\":13}}]}}" 
| spath prints.urls{} output=urls 
| mvexpand urls 
| spath input=urls 
| rename Time.db as db_time, Time.total as total_time, uri.name as uri_name

@monika0511
this query works well.

dmarling · ‎06-17-2020

Beware that mvexpand can really chew through memory on your search head if you have a large amount of events that it's being applied to. This can cause it to terminate early if your admins are enforcing memory limits on searches or can cause the PID to get killed by OOM Killer on a nix box. If you are okay with a multivalued list type view in your table for each event it's unnecessary to mvexpand. If you do have a requirement that each row be one url array, then just be careful on the amount of data it is being applied to.

Also even though the same amount of spaths are in my original solution, the performance is still better than invoking the spath search command the same amount of times because it is on a single pipe eval which means it's processed all at once. Having multiple pipes with spath will cause it to stop and start at each pipe which will add an admittedly negligible performance reduction.

If this comment/answer was helpful, please up vote it. Thank you.

dmarling · ‎06-17-2020

It appears the board has ate my answer I wrote earlier to prove that multiple eval spaths is faster than using even 1 spath command. I'll try again.

Using your original query with your sample here is your search performance on my local machine's splunk install:

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| spath input=data path=prints.urls{} output=urls
| spath input=urls path=response_time output=response_time
| spath input=urls path=uri.name output=uri_name
| spath input=urls path=Time.db output=db_time
| spath input=urls path=Time.total output=total_time
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

This search has completed and has returned 1 results by scanning 0 events in 0.119 seconds

The query I originally proposed:

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| eval response_time=spath(data, "prints.urls{}.response_time"),
uri_name=spath(data, "prints.urls{}.uri.name"),
db_time=spath(data, "prints.urls{}.Time.db"),
total_time=spath(data, "prints.urls{}.Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

This search has completed and has returned 1 results by scanning 0 events in 0.099 seconds

The search proposed by @to4kawa

| makeresults 
| eval _raw="{\"version\":\"v0.2\",\"prints\":{\"urls\":[{\"response_time\":256,\"uri\":{\"bool\":false,\"name\":\"abc\"},\"Time\":{\"total\":52,\"db\":11}},{\"response_time\":578,\"uri\":{\"bool\":false,\"name\":\"xyz\"},\"Time\":{\"total\":78,\"db\":13}}]}}" 
| spath prints.urls{} output=urls 
| mvexpand urls 
| spath input=urls 
| rename Time.db as db_time, Time.total as total_time, uri.name as uri_name

This search has completed and has returned 2 results by scanning 0 events in 0.244 seconds

And if you have a requirement that each url in the url array appears on it's own row in your table here's my modified version with the caveat I posed above about mvexpand being problematic on large data sets:

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| eval urls=spath(data,"prints.urls{}")
| mvexpand urls 
| eval response_time=spath(urls, "response_time"),
uri_name=spath(urls, "uri.name"),
db_time=spath(urls, "Time.db"),
total_time=spath(urls, "Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

This search has completed and has returned 2 results by scanning 0 events in 0.113 seconds

Ultimately you can see that using a single pipe eval with the spath command on each field you want will produce a more performant query by about 17% to your original query and at 48% improvement compared to the one by @to4kawa.

If this comment/answer was helpful, please up vote it. Thank you.

dmarling · ‎06-17-2020

refer to below response as this was a duplicate of that.

If this comment/answer was helpful, please up vote it. Thank you.

dmarling · ‎06-16-2020

You can use spath in an eval command and you can chain all of the fields into a single eval with a comma separating each field. This will make it more performant and it removes the need to do multiple spath commands:

basic search rv=*, av=*, wm=*, an=*, et=*
| eval response_time=spath(data, "prints.urls{}.response_time"),
uri_name=spath(data, "prints.urls{}.uri.name"),
db_time=spath(data, "prints.urls{}.Time.db"),
total_time=spath(data, "prints.urls{}.Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

Here's a run anywhere example using your sample data to demonstrate that it will work. Keep in mind it won' t have the rv, av, wm, an, and et fields due to that not being present in the sample example:

| makeresults
| eval data="{
   \"version\":\"v0.2\",
   \"prints\":{
      \"urls\":[
         {
            \"response_time\":256,
            \"uri\":{
               \"bool\":false,
               \"name\":\"abc\"
            },
            \"Time\":{
               \"total\":52,
               \"db\":11
            }
         },
         {
            \"response_time\":578,
            \"uri\":{
               \"bool\":false,
               \"name\":\"xyz\"
            },
            \"Time\":{
               \"total\":78,
               \"db\":13
            }
         }
      ]
   }
}"
| eval response_time=spath(data, "prints.urls{}.response_time"),
uri_name=spath(data, "prints.urls{}.uri.name"),
db_time=spath(data, "prints.urls{}.Time.db"),
total_time=spath(data, "prints.urls{}.Time.total")
| table _time, rv, av, wm, an, et, uri_name, response_time, db_time, total_time

If this comment/answer was helpful, please up vote it. Thank you.

monika0511 · ‎06-16-2020

@dmarling it will perform the same as it's using spath equal number of times as the original query I posted. but I agree the

spath(X,Y)

syntax is better. Thanks!

Avoid multiple spath for a better performant query

field extraction

fields

table

Developer Spotlight with Brett Adams

Index This | What can you do to make 55,555 equal 500?

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...