Splunk Search

Json Parsing..

onthakur
Engager

Json :-

| makeresults | eval _raw="{
\"a.com\": [
{ \"yahoo.com\":\"10ms\",\"trans-id\": \"x1\"},
{ \"google.com\":\"20ms\",\"trans-id\": \"x2\"}
],
\"trans-id\":\"m1\",
\"duration\":\"33ms\"
}"

 

need output in below format:-

_timeTrans_idurlDurationsub_durationsub_urlsub_trans_id
 m1a.com33ms10msyahoo.comx1
 m1a.com33ms20msgoogle.comx2
Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

As @PickleRick said, this is not about parsing but about presentation, and that spath command that we usually use is not handling JSON keys containing dot (.) correctly because in SPL, as well as in many other languages that flatten structured data use dot to represent hierarchy.

But keys containing dot are not the only problem that makes @dtburrows3's solution so wonky.  The bigger problem is the data design.  It uses implied semantics about what represents a URL.  Implied semantics in structured data is generally unacceptable. (At higher level, this is abusing key names to represent data.)  If you have any influence with your developers, beg them to change data structure so key and data are completely separate.

Thanks to @dtburrows3, I learned that fromjson (introduced in 9.0) is more robust than spath (from 7.0 or earlier), and learned the trick to leverage the evil dot in key name in order to single out actual data in abused structure, namely foreach *.*.  It is less robust but works for the limited dataset.  So, I offer a more semantic, hopefully less wonky solution.

 

| makeresults | eval _raw="{
\"a.com\": [
{ \"yahoo.com\":\"10ms\",\"trans-id\": \"x1\"},
{ \"google.com\":\"20ms\",\"trans-id\": \"x2\"}
],
\"trans-id\":\"m1\",
\"duration\":\"33ms\"
}"
``` data emulation above ```
| table _*
| fromjson _raw
| rename duration as Duration, trans-id as Trans_id
| foreach *
    [eval url = mvappend(url, if("<<FIELD>>" IN ("Duration", "Trans_id"), null, "<<FIELD>>"))]
| mvexpand url ``` nothing in this data structure prevents multiple URLs ```
| foreach *.*
    [mvexpand <<FIELD>>
    | eval subkey = json_array_to_mv(json_keys('<<FIELD>>'))
    | eval sub_trans_id = json_extract('<<FIELD>>', "trans-id")
    | eval subdata = json_object()
    | eval subdata = mvmap(subkey, if(subkey == "trans-id", null(), json_set(subdata,
      "sub_url", subkey, "sub_duration", json_extract_exact('<<FIELD>>', subkey))))]
| fromjson subdata
| table _time Trans_id url Duration sub_duration sub_url sub_trans_id

 

The output is

_timeTrans_idurlDurationsub_durationsub_urlsub_trans_id
2024-01-19 13:01:54m1a.com33ms10msyahoo.comx1
2024-01-19 13:01:54m1a.com33ms20msgoogle.comx2
Tags (1)

dtburrows3
Builder

SPL is a bit wonky but got results in the final format you were looking for, I'm curious how this SPL will perform against your live data.

 

 

 

| makeresults | eval _raw="{
\"a.com\": [
{ \"yahoo.com\":\"10ms\",\"trans-id\": \"x1\"},
{ \"google.com\":\"20ms\",\"trans-id\": \"x2\"}
],
\"b.com\": [
{ \"aspera.com\":\"30ms\",\"trans-id\": \"x3\"},
{ \"arista.com\":\"40ms\",\"trans-id\": \"x4\"}
],
\"trans-id\":\"m1\",
\"duration\":\"33ms\"
}"
    
    ``` start parsing json object ```
    | fromjson _raw
    | foreach *.*
        [
            | eval
                url_json=mvappend(
                    url_json,
                        case(
                            mvcount('<<FIELD>>')==1, if(isnotnull(json('<<FIELD>>')), json_set('<<FIELD>>', "url", "<<FIELD>>"), null()),
                            mvcount('<<FIELD>>')>1, mvmap('<<FIELD>>', if(isnotnull(json('<<FIELD>>')), json_set('<<FIELD>>', "url", "<<FIELD>>"), null()))
                            )
                    )
            ]
    | fields + _time, url_json, "trans-id", duration
    | rename
        "trans-id" as "top_trans-id"
    | fields - _raw
    | mvexpand url_json
    | fromjson url_json
    | fields - url_json
    | foreach *.*
        [
            | eval
                sub_url=if(
                    isnotnull('<<FIELD>>') AND isnull(sub_url),
                        "<<FIELD>>",
                        'sub_url'
                    ),
                sub_duration=if(
                    isnotnull('<<FIELD>>') AND isnull(sub_duration),
                        '<<FIELD>>',
                        'sub_duration'
                    )
            ]
    | rename
        "trans-id" as "sub_trans-id"
    | fields + _time, "top_trans-id", url, duration, sub_duration, sub_url, sub_trans-id
    | rename
        "top_trans-id" as "trans-id"

 

 

 Final output:

dtburrows3_0-1705678536919.png

 

There are some pretty big assumptions here, biggest being that the keys of the _raw json will have fields with the "*.*" format or a dot in the fieldname (domain names)

PickleRick
SplunkTrust
SplunkTrust

1. This has nothing to do (or at least not much) with parsing, more about summarizing data from your event.

2. Splunk seems to have problems with using spath when the names contain dots, so extracting the "a.com" part and splitting it might not be that easy.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...