Splunk Search

Extract fields from nested json structure with dynamic key

rkothari
Loves-to-Learn Everything

Hello,

I have nested json type log messages like below being forwarded to splunk -

 

{
        "timeStamp": "2021-03-11T07:45:49.780000+00:00",
        "status": "deactive",
        "deviceId": "uuid12345",
        "details": {
            "Device:Information": {
                "Type": "Apple",
                "Content": {
                    "uuid12345": {
                        "Name": "IOS",
                        "Version": "14.4"
                    }
                }
            }
        }
}

 

 

I'd like to generate a table like below out of all such log messages -

deviceIdNameVersion
uuid12345IOS14.4
uuid12346Android8.1

 

I am aware that a table of fields can be easily created using table command or stats (to get counts by Name and Version), however the problem with this log message structure is that the nested json path `details.Device:Information.Content` contains a key with value `uuid12345` which is dynamic in nature.

Therefore, a query like this doesn't work as I need since the wildcard character seem to create one column for each interpreted value like `details.Device:Information.Content.uuid12345.Name`, `details.Device:Information.Content.uuid12346.Name`, `details.Device:Information.Content.uuid12345.Version`, `details.Device:Information.Content.uuid12346.Version` -

 

| table deviceId, details.Device:Information.Content.*.Name, details.Device:Information.Content.*.Version

 

Is it possible to get this information extracted into a table like I described above? Would it be possible to extract `Name` and `Version` as fields so that I don't have to use full json path in table or stats command?

Thanks for your help in advance.

 

Labels (4)
Tags (1)
0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@rkothari 

Can you please try this?

YOUR_SEARCH
| eval Name="",Version=""
| foreach details.Device:Information.Content.*.Name [| eval Name=if(isnotnull('<<FIELD>>'),'<<FIELD>>',Name)]
| foreach details.Device:Information.Content.*.Version [| eval Version=if(isnotnull('<<FIELD>>'),'<<FIELD>>',Version)]
| table deviceId Name Version

 

My Sample Code:

 

| makeresults 
| eval _raw="{\"timeStamp\": \"2021-03-11T07:45:49.780000+00:00\",\"status\": \"deactive\",\"deviceId\": \"uuid12345\",\"details\": {\"Device:Information\": {\"Type\": \"Apple\",\"Content\": {\"uuid12345\": {\"Name\": \"IOS\",\"Version\": \"14.4\"}}}}}" | append [| makeresults 
| eval _raw="{\"timeStamp\": \"2021-03-11T07:45:49.780000+00:00\",\"status\": \"deactive\",\"deviceId\": \"uuid123245\",\"details\": {\"Device:Information\": {\"Type\": \"Apple\",\"Content\": {\"uuid123245\": {\"Name\": \"Android\",\"Version\": \"14.4\"}}}}}"  ]
| kv 
| table deviceId "details.Device:Information.Content.*.Name" "details.Device:Information.Content.*.Version"
| eval Name="",Version=""
| foreach details.Device:Information.Content.*.Name [| eval Name=if(isnotnull('<<FIELD>>'),'<<FIELD>>',Name)]
| foreach details.Device:Information.Content.*.Version [| eval Version=if(isnotnull('<<FIELD>>'),'<<FIELD>>',Version)]
| table deviceId Name Version

 

Thanks
KV
▄︻̷̿┻̿═━一

If any of my reply helps you to solve the problem Or gain knowledge, an upvote would be appreciated.

0 Karma

rkothari
Loves-to-Learn Everything

Thank you @kamlesh_vaghela for sharing this. Your solution gets me the required data in table as expected, however it doesn't consistently show me values for "Name" and "Version" columns in all rows. 

I have about 2000 log events like the structure in my question, but every time I re-run your solution query I get blank values for "Name" and "Version" columns for varying number of rows. For example, I see all rows populated with value in "deviceId" but I only see actual values for "Name" and "Version" columns in 24 rows. Remaining rows show blank value for "Name" and "Version". Next time I refresh the search query, the number 24 changes to another number.

Could the foreach part of your solution be somehow contributing to this inconsistent behavior?

UPDATE: I wanted to add that for the records that show a blank value in columns "Name" and "Version", I have verified that their raw log events have a valid string value at paths details.Device:Information.Content.*.Name and details.Device:Information.Content.*.Version.

ANOTHER UPDATE: I can reproduce this problematic behavior with your proposed solution by feeding more than 1 log event. Could you please try feeding 2-4 log events to your proposed solution that may help you understand the problem? I think solving this might make your proposal a complete answer. Thanks again for all your help.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@rkothari wrote:

...

ANOTHER UPDATE: I can reproduce this problematic behavior with your proposed solution by feeding more than 1 log event. Could you please try feeding 2-4 log events to your proposed solution that may help you understand the problem? I think solving this might make your proposal a complete answer. Thanks again for all your help.


I used the following generator to produce 40 randomized events:

 

| makeresults count=40
| streamstats count
| eval _time = now() - count * 65, uuid = 1000 + random() %256, type = if(random() % 3 == 0, "Apple", "Not Apple"), os = if(type == "Apple", "iOS", "Android"), version = if(os = "Apple", 11 + random() % 4 . "." . random() % 5, 3 + random() % 10 . "." . random() % 2)
| eval  _raw="{\"timeStamp\": \"" . strftime(_time, "%Y:%m:%d %H:%M:%S") . "\",\"status\": \"deactive\",\"deviceId\": \"uuid" . uuid . "\",\"details\": {\"Device:Information\": {\"Type\": \"" . type . "\",\"Content\": {\"uuid" . uuid . "\": {\"Name\": \"" . os . "\",\"Version\": \"". version . "\"}}}}}"
| fields _raw

 

Sample events look like

_raw_time
{"timeStamp": "2021:05:28 05:19:56","status": "deactive","deviceId": "uuid1093","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1093": {"Name": "Android","Version": "9.0"}}}}}2021-05-28 05:19:56
{"timeStamp": "2021:05:28 05:18:51","status": "deactive","deviceId": "uuid1148","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1148": {"Name": "Android","Version": "7.0"}}}}}2021-05-28 05:18:51
{"timeStamp": "2021:05:28 05:17:46","status": "deactive","deviceId": "uuid1234","details": {"Device:Information": {"Type": "Apple","Content": {"uuid1234": {"Name": "iOS","Version": "4.1"}}}}}2021-05-28 05:17:46
{"timeStamp": "2021:05:28 05:16:41","status": "deactive","deviceId": "uuid1136","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1136": {"Name": "Android","Version": "8.0"}}}}}2021-05-28 05:16:41
{"timeStamp": "2021:05:28 05:15:36","status": "deactive","deviceId": "uuid1085","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1085": {"Name": "Android","Version": "3.0"}}}}}2021-05-28 05:15:36
{"timeStamp": "2021:05:28 05:14:31","status": "deactive","deviceId": "uuid1151","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1151": {"Name": "Android","Version": "3.0"}}}}}2021-05-28 05:14:31
{"timeStamp": "2021:05:28 05:13:26","status": "deactive","deviceId": "uuid1098","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1098": {"Name": "Android","Version": "9.0"}}}}}2021-05-28 05:13:26
{"timeStamp": "2021:05:28 05:12:21","status": "deactive","deviceId": "uuid1029","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1029": {"Name": "Android","Version": "5.1"}}}}}2021-05-28 05:12:21
{"timeStamp": "2021:05:28 05:11:16","status": "deactive","deviceId": "uuid1181","details": {"Device:Information": {"Type": "Apple","Content": {"uuid1181": {"Name": "iOS","Version": "5.1"}}}}}2021-05-28 05:11:16
{"timeStamp": "2021:05:28 05:10:11","status": "deactive","deviceId": "uuid1186","details": {"Device:Information": {"Type": "Not Apple","Content": {"uuid1186": {"Name": "Android","Version": "4.1"}}}}}2021-05-28 05:10:11
... 

 

I then tested both filters by @kamlesh_vaghela and indeed, they both work as expected.  So we can at least exclude the possibility that multiple events cause those filters to fail.

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@rkothari 

Can you please try this?

YOUR_SEARCH
| rex field=_raw "\"Name\":\s\"(?<Name>\w+)\",\"Version\":\s\"(?<Version>.+)\""
| table deviceId Name Version

 

My Sample Search :

| makeresults count=2000 | eval a=1 | accum a
| eval _raw="{\"timeStamp\": \"2021-03-11T07:45:49.780000+00:00\",\"status\": \"deactive\",\"deviceId\": \"uuid12345".a."\",\"details\": {\"Device:Information\": {\"Type\": \"Apple\",\"Content\": {\"uuid12345".a."\": {\"Name\": \"IOS\",\"Version\": \"14.4\"}}}}}" |kv
| rex field=_raw "\"Name\":\s\"(?<Name>\w+)\",\"Version\":\s\"(?<Version>.+)\""
| table deviceId Name Version


 Thanks
KV
▄︻̷̿┻̿═━一

If any of my reply helps you to solve the problem Or gain knowledge, an upvote would be appreciated.

0 Karma

rkothari
Loves-to-Learn Everything

Hello @kamlesh_vaghela 

Your proposal using rex doesn't work with my data. It somehow works correctly with the way you are loading your sample data, but when I apply the rex and table command to my data like you have shared, it shows the Name and Version columns empty for all rows.

I think your previous solution using foreach really got us very close except that weird behavior where some rows would show Name and Version columns empty randomly.  Do you have any idea how that part can be resolved in your first proposal? I think the weird behavior with foreach sounds very similar to this another post. Is there any bug with foreach functionality that would result in such behavior?

Thanks very much again!

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

@rkothari 

Actually both solutions should work and 2nd solution is much faster than 1st one. So can you please share your exact sample events, means _raw with no JSON formatting ?

 

Thanks
KV
▄︻̷̿┻̿═━一

If any of my reply helps you to solve the problem Or gain knowledge, an upvote would be appreciated.

0 Karma
Get Updates on the Splunk Community!

Splunk App for Anomaly Detection End of Life Announcment

Q: What is happening to the Splunk App for Anomaly Detection?A: Splunk is officially announcing the ...

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...