Getting Data In

Not able to extract Nested JSON

surejsajeev
Explorer

Hi,

I have an event that is an entire JSON. It looks something like this. 

{

                Key1 : {

                                      KEY2: VAL2

                                      KEY3: VAL3

                                      ....

                                   },

                KeyX  : {

                                      KEY2: VAL2

                                      KEY3: VAL3

                                      ....

                                   },

              KeyY : {

                                      KEY2: VAL2

                                      KEY3: VAL3

                                      ....

                                   }

}

 

Here Key1 and KeyX and KeyY are unknown to me, meaning they can change all the time. I would get around 100 such sub-dictionaries. I just was the sub-dictionary inside, as separate Splunk events.

{

                                      KEY2: VAL2

                                      KEY3: VAL3

                                      ....

                                   }

I have tried a lot of different search queries using spath, but nothing seems to help. 

Could someone please help me with this. I very much appreciate it.

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

There might be an easier way to do this, but you could extract all the fields with spath, then reconstruct the fields and values, removing the "unknown" part of the fieldname

| makeresults 
| eval _raw="{
	\"KeyA\": {
		\"Key2\": 3,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 5,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 7,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 9,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 1,
		\"Key3\": 7
	}
}|{
	\"KeyA\": {
		\"Key2\": 1,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 2,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 3,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 4,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 5,
		\"Key3\": 7
	}
}"
| eval events=split(_raw,"|") 
| mvexpand events
| fields - _*
| rename events as _raw


| spath
| fields - _raw
| foreach *.*
    [| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
    [| eval events=if(isnull(events),mvjoin(<<FIELD>>,"|"),mvappend(events,mvjoin(<<FIELD>>,"|")))]
| fields events
| mvexpand events
| streamstats count as row 
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

This assume a two level nesting in JSON as shown in your example. More complex JSON formats might not work so well.

Another way (perhaps simpler) would be to replace all the "unknown" values with known values

| spath
| foreach *.*
    [| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| spath path=known

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

There might be an easier way to do this, but you could extract all the fields with spath, then reconstruct the fields and values, removing the "unknown" part of the fieldname

| makeresults 
| eval _raw="{
	\"KeyA\": {
		\"Key2\": 3,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 5,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 7,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 9,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 1,
		\"Key3\": 7
	}
}|{
	\"KeyA\": {
		\"Key2\": 1,
		\"Key3\": 3
	},
	\"KeyE\": {
		\"Key2\": 2,
		\"Key3\": 4
	},
	\"KeyC\": {
		\"Key2\": 3,
		\"Key3\": 5
	},
	\"KeyH\": {
		\"Key2\": 4,
		\"Key3\": 6
	},
	\"KeyR\": {
		\"Key2\": 5,
		\"Key3\": 7
	}
}"
| eval events=split(_raw,"|") 
| mvexpand events
| fields - _*
| rename events as _raw


| spath
| fields - _raw
| foreach *.*
    [| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
    [| eval events=if(isnull(events),mvjoin(<<FIELD>>,"|"),mvappend(events,mvjoin(<<FIELD>>,"|")))]
| fields events
| mvexpand events
| streamstats count as row 
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

This assume a two level nesting in JSON as shown in your example. More complex JSON formats might not work so well.

Another way (perhaps simpler) would be to replace all the "unknown" values with known values

| spath
| foreach *.*
    [| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| spath path=known

surejsajeev
Explorer

Thank you very much @ITWhisperer  for the solution. The query works perfectly. However when I run it, I get this error message from splunk 

"The search you ran returned a number of fields that exceeded the current indexed field extraction limit. To ensure that all fields are extracted for search, set limits.conf: [kv] / indexed_kv_limit to a number that is higher than the number of fields contained in the files that you index."

Could you advise on how I can resolve this issue, please? I am not sure of the no of fields that my query will generate. Any dynamic limit that I can see?

Your help is much appreciated.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Which of the two approaches gives you this error? Either way, you (or your splunk admin) should increase the limit of fields available in the configuration.

0 Karma

surejsajeev
Explorer

it was the second one @ITWhisperer 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could try removing the unknown field once it has been used

| spath
| foreach *.*
    [| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| fields - unknown
| spath path=known

Does the job inspector inform you how far through the search it got before it ran out of fields?

0 Karma

surejsajeev
Explorer

@ITWhisperer 

These are the message i see from the job inspector.

Forcefully terminated search process with sid=<SID> since its physical memory usage (34997.320000 MB) has exceeded the physical memory threshold specified in limits.conf/search_process_memory_usage_threshold (30000.000000 MB)

-----------------------------------------------------------

command.mvexpand: output will be truncated at 11700 results due to excessive memory usage. Memory threshold of 500MB as configured in limits.conf / [mvexpand] / max_mem_usage_mb has been reached.

-----------------------------------------------------------

Forcefully terminated search process with sid=<SID> since its physical memory usage (34997.320000 MB) has exceeded the physical memory threshold specified in limits.conf/search_process_memory_usage_threshold (30000.000000 MB)

-----------------------------------------------------------

This is the query that I am using 

| spath
| foreach *.*
[| eval unknown=if(isnull(unknown),"<<MATCHSEG1>>",mvdedup(mvappend(unknown,"<<MATCHSEG1>>")))]
| fields unknown
| mvexpand unknown
| eval _raw=replace(_raw,"\"".unknown."\"","\"known\"")
| fields - unknown
| spath path=known| spath input=known | table COLUMN1, COLUMN2, COLUMN3, ....COLUNM24 | where match(TIMESTAMP,".") AND like(COLUMN1, "%")

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

mvexpand has a memory limit - the first option might be more efficient in terms of memory usage since it removes _raw as soon as it is no longer required and similarly for other fields created along the way. If changing the configuration still isn't enough, then have a look at my post on ways around mvexpand limits

https://community.splunk.com/t5/Splunk-Search/mvexpand-limits/m-p/549178

 

0 Karma

surejsajeev
Explorer

Thank you @ITWhisperer

I tried the first option in my query and it did not work (The reason why I used the second option). the error message from the first was this

Failed to parse templatized search for field 'ABC_DEB_SJW_AK_SIE_122_EL-3602110-134:checksumawq:wwrikahs.AEUUR

This is my query

base search | spath
| fields - _raw
| foreach *.*
[| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
[| eval events=if(isnull(events),mvjoin(<<FIELD>>,"|"),mvappend(events,mvjoin(<<FIELD>>,"|")))]
| fields events
| mvexpand events
| streamstats count as row
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

 

Any idea on why this is? is it because of the "." ?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Possibly the "." but more likely the "-". Try putting the <<FIELD>> names in single quotes.

0 Karma

surejsajeev
Explorer

@ITWhisperer

Changed the query to 

index=edennet-self-organising-app-s sourcetype="_json" | spath
| fields - _raw
| foreach *.*
[| eval Group_<<MATCHSEG1>>=if(isnull(Group_<<MATCHSEG1>>),"<<MATCHSEG2>>:".'<<FIELD>>',mvappend(Group_<<MATCHSEG1>>,"<<MATCHSEG2>>:".'<<FIELD>>'))]
| fields *_*
| foreach *_*
[| eval events=if(isnull(events),mvjoin('<<FIELD>>',"|"),mvappend(events,mvjoin('<<FIELD>>',"|")))]
| fields events
| mvexpand events
| streamstats count as row
| eval events=split(events,"|")
| mvexpand events
| eval key=mvindex(split(events,":"),0)
| eval value=mvindex(split(events,":"),1)
| eval {key}=value
| fields - key value events
| stats values(*) as * by row
| fields - row

 

but still got the same error. The error occurs for strings without the "-" too.  the strings consist of characters such as "," , "=", "_" and ":".

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Perhaps you could share a more realistic anonymised example of the JSON you are dealing with as the initial made-up example does not appear to be representative enough. It often helps to have a more complete picture to work from.

0 Karma

surejsajeev
Explorer

sure @ITWhisperer

 

This is how the JSON will look like.

{
"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source create": {
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "None",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source create",
"MOD_9": "A1",
"MOD_10": "None",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
},
"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:param1": {
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "12",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source change",
"MOD_9": "A1",
"MOD_10": "param1",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
},
"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:temo1aaa": {
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "-1231321",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source change",
"MOD_9": "A1",
"MOD_10": "temo1aaa",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"
}
}

This entire dictionary is considered as one single event in Splunk. 

I wanted to parse this dictionary and extract the second-level dictionary. i.e. this part alone(see below), as an individual Splunk event, so that I can use spath on it and make it in the form of a table.

{
"COMMENT": "None",
"MODULE_DATA": "source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12",
"MOD_1": "source_inst",
"MOD_2": "name_1",
"MOD_3": "sub_inst",
"MOD_4": "execute",
"MOD_5": "None",
"MOD_6": "XER1",
"MOD_7": "None",
"MOD_8": "source create",
"MOD_9": "A1",
"MOD_10": "None",
"MOD_11": "None",
"MOD_12": "None",
"TIMESTAMP": "test_time"

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You may still have problems with mvexpand by how about this to deal with unusual names:

| makeresults 
| eval _raw="{
\"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source create\": {
\"COMMENT\": \"None\",
\"MODULE_DATA\": \"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12\",
\"MOD_1\": \"source_inst\",
\"MOD_2\": \"name_1\",
\"MOD_3\": \"sub_inst\",
\"MOD_4\": \"execute\",
\"MOD_5\": \"None\",
\"MOD_6\": \"XER1\",
\"MOD_7\": \"None\",
\"MOD_8\": \"source create\",
\"MOD_9\": \"A1\",
\"MOD_10\": \"None\",
\"MOD_11\": \"None\",
\"MOD_12\": \"None\",
\"TIMESTAMP\": \"test_time\"
},
\"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:param1\": {
\"COMMENT\": \"None\",
\"MODULE_DATA\": \"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12\",
\"MOD_1\": \"source_inst\",
\"MOD_2\": \"name_1\",
\"MOD_3\": \"sub_inst\",
\"MOD_4\": \"execute\",
\"MOD_5\": \"12\",
\"MOD_6\": \"XER1\",
\"MOD_7\": \"None\",
\"MOD_8\": \"source change\",
\"MOD_9\": \"A1\",
\"MOD_10\": \"param1\",
\"MOD_11\": \"None\",
\"MOD_12\": \"None\",
\"TIMESTAMP\": \"test_time\"
},
\"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12:source change:temo1aaa\": {
\"COMMENT\": \"None\",
\"MODULE_DATA\": \"source_dict=source_name,source_dict=location,context=location_code,manage=1,function=1,data1=1,data2=12\",
\"MOD_1\": \"source_inst\",
\"MOD_2\": \"name_1\",
\"MOD_3\": \"sub_inst\",
\"MOD_4\": \"execute\",
\"MOD_5\": \"-1231321\",
\"MOD_6\": \"XER1\",
\"MOD_7\": \"None\",
\"MOD_8\": \"source change\",
\"MOD_9\": \"A1\",
\"MOD_10\": \"temo1aaa\",
\"MOD_11\": \"None\",
\"MOD_12\": \"None\",
\"TIMESTAMP\": \"test_time\"
}
}"
| spath
| transpose 0
| eval dict=if(column="_time" OR column="_raw",column,mvindex(split(column,"."),0,-2))
| dedup dict
| eventstats values(dict) as dict
| eval dict=mvfilter(dict!="_raw" AND dict!="_time")
| eval "row 1"=if(column="_time" OR column="_raw",'row 1',dict)
| eval column=if(column="_time" OR column="_raw",column,"dict")
| dedup column
| fields - dict
| transpose 0 header_field=column
| fields - column
| mvexpand dict
| eval _raw=replace(_raw,"\"".dict."\"","\"known\"")
| spath path=known
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Here's another approach which only uses mvexpand to create one extra row, the other rows are created with makecontinuous and populated with filldown.

| spath
| foreach *.*
    [| eval dict=if(isnull(dict),"<<MATCHSEG1>>",mvappend(dict,"<<MATCHSEG1>>"))]
| fields dict
| eval dict=mvdedup(dict)
| eval rows=mvcount(dict) 
| eval rows=mvappend(rows,"1")
| mvexpand rows
| rename _time as time, _raw as raw
| makecontinuous rows
| filldown *
| eval raw=replace(raw,"\"".mvindex(dict,rows-1)."\"","\"known\"")
| spath input=raw path=known

If that still runs out of memory in mvexpand, there is another way to create the extra row.

0 Karma

surejsajeev
Explorer

Hi @ITWhisperer,

I tried both the queries that you have mentioned and both of them showed the same errors as before. Is there any other available option?.

I can increase the limit in limits.conf file.

Currently, this is my limits.conf. 

[thruput]
maxKBps = 0

[search]
enable_memory_tracker=true
search_process_memory_usage_threshold=4000

would this solve my issue?.  The error message was  "since its physical memory usage (30891.121000 MB) has exceeded the physical memory threshold specified in limits.conf/search_process_memory_usage_threshold (30000.000000 MB)."

I am very thankful to all the help you have been doing :). 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

To be honest, I don't know - someone else does that stuff for me 🙂 - having said that, a quick glance through the docs and I found this stanza which you might want to try too

[mvexpand]

* This stanza allows for fine tuning of mvexpand search command.

max_mem_usage_mb = <non-negative integer>
* Overrides the default value for "max_mem_usage_mb".
* Limits the amount of RAM, in megabytes (MB), a batch of events or results will
  use in the memory of a search process.
* See definition in the [default] stanza for "max_mem_usage_mb"
  for more details.
* Default: 500
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...