Splunk Search

How to properly extract into individual fields with a complicated nested json array

randy_moore
Path Finder

Warning:  Long, detailed explanation ahead. 🙂   Summary version is that I have a nested json arrays and fields that I am having an issue with extracting properly into individual fields.  The chosen fields will change over time, based on external factors so I need to be able to extract and report on all of them, with the ability to identify the array index (i.e. {0}, or {1}, etc).   No solution that I have looked at or come up with is working for me, so I am turning to you smarter folks to help.

Detail:
I have a nested json arrays and fields that I am having an issue with extracting properly into individual fields.  
The end result is that I want to be able to place alerts or report on various fields that are deemed interesting.

These are "request" and "response" arrays in each transaction (think checking items in a shopping cart for various flag and indicators).   

The chosen fields will change over time, based on external factors so I need to be able to extract them from the array  and report on all of them at some point.

Here is a sample request and response

market basket json.jpg


As you can see the the request array is market_basket.request{} and the response is market_basket.response{}. Focusing on the response portion, the first response has an "02" field and a "dataset". The next response{1} has fields 02,03,04,05,08,etc etc., same with response{2} and response{3}

if I do a simple rename
| rename market_basket.response.* to Resp_*
the fields don't line up. The contents of "Resp_19" should be down 1 line as there was no field 19 in market_basket.response{0}.  See here: 
market basket rename.jpg

If I change the query to this

 

| spath path=market_basket.response{} output=Response
| spath input=Response
| table tran_id 0* 1* 2*  dataset

 

Then I only get the first row, the other 3 rows don't show up.

The only way that I have been able to get it to work is to address each indices and  field individually

 

| spath path=market_basket.response{0} output=Resp_0 
| spath path=market_basket.response{0}.dataset output=Resp_0_dataset 
| spath path=market_basket.response{0}.02 output=Resp_0_02 
| spath path=market_basket.response{1} output=Resp_1 
| spath path=market_basket.response{1}.dataset output=Resp_1_dataset 
| spath path=market_basket.response{1}.01 output=Resp_1_01 
| spath path=market_basket.response{1}.02 output=Resp_1_02 
| spath path=market_basket.response{1}.03 output=Resp_1_03 
| spath path=market_basket.response{1}.04 output=Resp_1_04 
| spath path=market_basket.response{1}.05 output=Resp_1_05 
| spath path=market_basket.response{1}.06 output=Resp_1_06 
| spath path=market_basket.response{1}.07 output=Resp_1_07 
| spath path=market_basket.response{1}.08 output=Resp_1_08 
| spath path=market_basket.response{1}.09 output=Resp_1_09 
| spath path=market_basket.response{1}.10 output=Resp_1_10 
| spath path=market_basket.response{1}.11 output=Resp_1_11 
| spath path=market_basket.response{1}.12 output=Resp_1_12 
| spath path=market_basket.response{1}.13 output=Resp_1_13 
| spath path=market_basket.response{1}.14 output=Resp_1_14 
| spath path=market_basket.response{1}.15 output=Resp_1_15 
| spath path=market_basket.response{1}.16 output=Resp_1_16 
| spath path=market_basket.response{1}.17 output=Resp_1_17 
| spath path=market_basket.response{1}.18 output=Resp_1_18 
| spath path=market_basket.response{1}.19 output=Resp_1_19 
| spath path=market_basket.response{1}.20 output=Resp_1_20 
| spath path=market_basket.response{1}.21 output=Resp_1_21 
...

 

But with up to 60 responses with 20 fields per transaction, that many spaths would be a non-starter. Especially considering that I need to factor in the request portions too at some point.

Finally, to give an example use case, I want to be able to check field 19 on the response  and if the flag starts with "NN" or "NY", then put out an alert: "Item".market_basket{whatever #}.02." has been not been cleared for sale". Flags are:".market_basket{whatever #].19

I know that was a lot of detail, but I wanted to make sure that I put down the different ways that I tried.
Any help would be much appreciated!

Labels (1)
Tags (3)

bowesmana
SplunkTrust
SplunkTrust

If it's true that the first response array element is always just 02 and dataset, then this would do the trick after your rename

| eval Resp_02=mvindex(Resp_02,1,99), Resp_dataset=mvindex(Resp_dataset,1,99)

i.e. just remove the first element from those two multivalue fields.

 

0 Karma

PickleRick
Ultra Champion

Well, splunk is not keeping track of fancy json structure and doesn't really care about indexes in arrays in json structures and addressing them in simple evals.

True, with spath it's possible to addres individually addressed subelements but that's how spath (which is a much "heavier" machinery than a simple eval) works. Furthermore, splunk itself isn't very much into complicated data structures. You can't do a two-dimensional array. Multivalued fiels is the most you can count on.

You could try to split the response into a multivalued field consisting of the "subjsons", do mvexpand to divide it into separate events and then do spath on that field.

Something like that:

| spath "market_basket.response{}" output=my_responses
| mvexpand my_responses
| spath input=my_responses
Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

@randy_moore wrote:

If I change the query to this

| spath path=market_basket.response{} output=Response
| spath input=Response
| table tran_id 0* 1* 2* dataset

Then I only get the first row, the other 3 rows don't show up.

What @PickleRick  meant was

| spath path=market_basket.response{} output=Responses
| mvexpand Responses
| spath input=Responses

 In other words, just add mvexpand to your quoted query.  Every element in the response array will be in its own row.

Consider these points:

  • It is really unclear what the end goal is.  If you want to tabulate all key-value combinations including non-existence (null), PickleRick's method suffices.  You can then flatten field names with "| rename response.* to Resp_*" Such tabulation also allows you to  do many other things.
  • It is best to illustrate input data with text, not screenshot.  It is a lot easier for people to generate ideas with data that can be passed into Splunk.
0 Karma