Solved: Not able to extract Nested JSON

surejsajeev · ‎05-16-2021

Hi,

I have an event that is an entire JSON. It looks something like this.

{

Key1 : {

KEY2: VAL2

KEY3: VAL3

....

},

KeyX : {

KEY2: VAL2

KEY3: VAL3

....

},

KeyY : {

KEY2: VAL2

KEY3: VAL3

....

}

Here Key1 and KeyX and KeyY are unknown to me, meaning they can change all the time. I would get around 100 such sub-dictionaries. I just was the sub-dictionary inside, as separate Splunk events.

{

KEY2: VAL2

KEY3: VAL3

....

}

I have tried a lot of different search queries using spath, but nothing seems to help.

Could someone please help me with this. I very much appreciate it.

ITWhisperer · ‎05-16-2021

There might be an easier way to do this, but you could extract all the fields with spath, then reconstruct the fields and values, removing the "unknown" part of the fieldname

| makeresults 
| eval _raw="{
	\"KeyA\": {
		\"Key2\": 3,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 5,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 7,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 9,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 1,
		\"Key3\": 7
	}
}|{
	\"KeyA\": {
		\"Key2\": 1,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 2,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 3,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 4,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 5,
		\"Key3\": 7
	}
}"
| eval events=split(_raw,"|") 
| mvexpand events
| fields - _*
| rename events as _raw


| spath
| fields - _raw
| foreach *.*
    [| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
    [| eval events=if(isnull(events),mvjoin(<<FIELD>>,"|"),mvappend(events,mvjoin(<<FIELD>>,"|")))]
| fields events
| mvexpand events
| streamstats count as row 
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

This assume a two level nesting in JSON as shown in your example. More complex JSON formats might not work so well.

Another way (perhaps simpler) would be to replace all the "unknown" values with known values

| spath
| foreach *.*
    [| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| spath path=known

View solution in original post

ITWhisperer · ‎05-16-2021

There might be an easier way to do this, but you could extract all the fields with spath, then reconstruct the fields and values, removing the "unknown" part of the fieldname

| makeresults 
| eval _raw="{
	\"KeyA\": {
		\"Key2\": 3,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 5,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 7,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 9,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 1,
		\"Key3\": 7
	}
}|{
	\"KeyA\": {
		\"Key2\": 1,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 2,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 3,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 4,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 5,
		\"Key3\": 7
	}
}"
| eval events=split(_raw,"|") 
| mvexpand events
| fields - _*
| rename events as _raw


| spath
| fields - _raw
| foreach *.*
    [| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
    [| eval events=if(isnull(events),mvjoin(<<FIELD>>,"|"),mvappend(events,mvjoin(<<FIELD>>,"|")))]
| fields events
| mvexpand events
| streamstats count as row 
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

This assume a two level nesting in JSON as shown in your example. More complex JSON formats might not work so well.

Another way (perhaps simpler) would be to replace all the "unknown" values with known values

| spath
| foreach *.*
    [| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| spath path=known

surejsajeev · ‎05-16-2021

Thank you very much @ITWhisperer for the solution. The query works perfectly. However when I run it, I get this error message from splunk

"The search you ran returned a number of fields that exceeded the current indexed field extraction limit. To ensure that all fields are extracted for search, set limits.conf: [kv] / indexed_kv_limit to a number that is higher than the number of fields contained in the files that you index."

Could you advise on how I can resolve this issue, please? I am not sure of the no of fields that my query will generate. Any dynamic limit that I can see?

Your help is much appreciated.

ITWhisperer · ‎05-17-2021

Which of the two approaches gives you this error? Either way, you (or your splunk admin) should increase the limit of fields available in the configuration.

surejsajeev · ‎05-17-2021

it was the second one @ITWhisperer

ITWhisperer · ‎05-17-2021

You could try removing the unknown field once it has been used

| spath
| foreach *.*
    [| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| fields - unknown
| spath path=known

Does the job inspector inform you how far through the search it got before it ran out of fields?

surejsajeev · ‎05-17-2021

@ITWhisperer

These are the message i see from the job inspector.

Forcefully terminated search process with sid=<SID> since its physical memory usage (34997.320000 MB) has exceeded the physical memory threshold specified in limits.conf/search_process_memory_usage_threshold (30000.000000 MB)

-----------------------------------------------------------

command.mvexpand: output will be truncated at 11700 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.

-----------------------------------------------------------

Forcefully terminated search process with sid=<SID> since its physical memory usage (34997.320000 MB) has exceeded the physical memory threshold specified in limits.conf/search_process_memory_usage_threshold (30000.000000 MB)

-----------------------------------------------------------

This is the query that I am using

ITWhisperer · ‎05-17-2021

mvexpand has a memory limit - the first option might be more efficient in terms of memory usage since it removes _raw as soon as it is no longer required and similarly for other fields created along the way. If changing the configuration still isn't enough, then have a look at my post on ways around mvexpand limits

https://community.splunk.com/t5/Splunk-Search/mvexpand-limits/m-p/549178

surejsajeev · ‎05-18-2021

Thank you @ITWhisperer

I tried the first option in my query and it did not work (The reason why I used the second option). the error message from the first was this

Failed to parse templatized search for field 'ABC_DEB_SJW_AK_SIE_122_EL-3602110-134:checksumawq:wwrikahs.AEUUR

This is my query

base search | spath
| fields - _raw
| foreach *.*
[| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
[| eval events=if(isnull(events),mvjoin(<<FIELD>>,"|"),mvappend(events,mvjoin(<<FIELD>>,"|")))]
| fields events
| mvexpand events
| streamstats count as row
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

Any idea on why this is? is it because of the "." ?

ITWhisperer · ‎05-18-2021

Possibly the "." but more likely the "-". Try putting the <<FIELD>> names in single quotes.

surejsajeev · ‎05-18-2021

@ITWhisperer

Changed the query to

index=edennet-self-organising-app-s sourcetype="_json" | spath
| fields - _raw
| foreach *.*
[| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
[| eval events=if(isnull(events),mvjoin('<<FIELD>>',"|"),mvappend(events,mvjoin('<<FIELD>>',"|")))]
| fields events
| mvexpand events
| streamstats count as row
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

but still got the same error. The error occurs for strings without the "-" too. the strings consist of characters such as "," , "=", "_" and ":".

ITWhisperer · ‎05-18-2021

Perhaps you could share a more realistic anonymised example of the JSON you are dealing with as the initial made-up example does not appear to be representative enough. It often helps to have a more complete picture to work from.

surejsajeev · ‎05-18-2021

sure @ITWhisperer

This is how the JSON will look like.

{
"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source create": {
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "None",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source create",
"MOD_9": "A1",
"MOD_10": "None",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
},
"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:param1": {
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "12",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source change",
"MOD_9": "A1",
"MOD_10": "param1",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
},
"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:temo1aaa": {
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "-1231321",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source change",
"MOD_9": "A1",
"MOD_10": "temo1aaa",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
}
}

This entire dictionary is considered as one single event in Splunk.

I wanted to parse this dictionary and extract the second-level dictionary. i.e. this part alone(see below), as an individual Splunk event, so that I can use spath on it and make it in the form of a table.

{
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "None",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source create",
"MOD_9": "A1",
"MOD_10": "None",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
}

ITWhisperer · ‎05-18-2021

You may still have problems with mvexpand by how about this to deal with unusual names:

| makeresults 
| eval _raw="{
\"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source create\": {
\"COMMENT\": \"None\",
\"MODULE_DATA\": \"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12\",
\"MOD_1\": \"source_inst\",
\"MOD_2\": \"name_1\",
\"MOD_3\": \"sub_inst\",
\"MOD_4\": \"execute\",
\"MOD_5\": \"None\",
\"MOD_6\": \"XER1\",
\"MOD_7\": \"None\",
\"MOD_8\": \"source create\",
\"MOD_9\": \"A1\",
\"MOD_10\": \"None\",
\"MOD_11\": \"None\",
\"MOD_12\": \"None\",
\"TIMESTAMP\": \"test_time\"
},
\"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:param1\": {
\"COMMENT\": \"None\",
\"MODULE_DATA\": \"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12\",
\"MOD_1\": \"source_inst\",
\"MOD_2\": \"name_1\",
\"MOD_3\": \"sub_inst\",
\"MOD_4\": \"execute\",
\"MOD_5\": \"12\",
\"MOD_6\": \"XER1\",
\"MOD_7\": \"None\",
\"MOD_8\": \"source change\",
\"MOD_9\": \"A1\",
\"MOD_10\": \"param1\",
\"MOD_11\": \"None\",
\"MOD_12\": \"None\",
\"TIMESTAMP\": \"test_time\"
},
\"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:temo1aaa\": {
\"COMMENT\": \"None\",
\"MODULE_DATA\": \"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12\",
\"MOD_1\": \"source_inst\",
\"MOD_2\": \"name_1\",
\"MOD_3\": \"sub_inst\",
\"MOD_4\": \"execute\",
\"MOD_5\": \"-1231321\",
\"MOD_6\": \"XER1\",
\"MOD_7\": \"None\",
\"MOD_8\": \"source change\",
\"MOD_9\": \"A1\",
\"MOD_10\": \"temo1aaa\",
\"MOD_11\": \"None\",
\"MOD_12\": \"None\",
\"TIMESTAMP\": \"test_time\"
}
}"
| spath
| transpose 0
| eval dict=if(column="_time" OR column="_raw",column,mvindex(split(column,"."),0,-2))
| dedup dict
| eventstats values(dict) as dict
| eval dict=mvfilter(dict!="_raw" AND dict!="_time")
| eval "row 1"=if(column="_time" OR column="_raw",'row 1',dict)
| eval column=if(column="_time" OR column="_raw",column,"dict")
| dedup column
| fields - dict
| transpose 0 header_field=column
| fields - column
| mvexpand dict
| eval _raw=replace(_raw,"\"".dict."\"","\"known\"")
| spath path=known

ITWhisperer · ‎05-18-2021

Here's another approach which only uses mvexpand to create one extra row, the other rows are created with makecontinuous and populated with filldown.

| spath
| foreach *.*
    [| eval dict=if(isnull(dict),"<<MATCHSEG1>>",mvappend(dict,"<<MATCHSEG1>>"))]
| fields dict
| eval dict=mvdedup(dict)
| eval rows=mvcount(dict) 
| eval rows=mvappend(rows,"1")
| mvexpand rows
| rename _time as time, _raw as raw
| makecontinuous rows
| filldown *
| eval raw=replace(raw,"\"".mvindex(dict,rows-1)."\"","\"known\"")
| spath input=raw path=known

If that still runs out of memory in mvexpand, there is another way to create the extra row.

surejsajeev · ‎05-19-2021

Hi @ITWhisperer,

I tried both the queries that you have mentioned and both of them showed the same errors as before. Is there any other available option?.

I can increase the limit in limits.conf file.

Currently, this is my limits.conf.

[thruput]
maxKBps = 0

[search]
enable_memory_tracker=true
search_process_memory_usage_threshold=4000

would this solve my issue?. The error message was "since its physical memory usage (30891.121000 MB) has exceeded the physical memory threshold specified in limits.conf/search_process_memory_usage_threshold (30000.000000 MB)."

I am very thankful to all the help you have been doing :).

ITWhisperer · ‎05-19-2021

To be honest, I don't know - someone else does that stuff for me 🙂 - having said that, a quick glance through the docs and I found this stanza which you might want to try too

[mvexpand]

* This stanza allows for fine tuning of mvexpand search command.

max_mem_usage_mb = <non-negative integer>
* Overrides the default value for "max_mem_usage_mb".
* Limits the amount of RAM, in megabytes (MB), a batch of events or results will
  use in the memory of a search process.
* See definition in the [default] stanza for "max_mem_usage_mb"
  for more details.
* Default: 500

Not able to extract Nested JSON

JSON

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life