Splunk Search

How to extract from complex JSON file?

SplunkDash
Builder

Hello,

I have complex JSON events ingested as *.log files. I have issues (or couldn't do) with extracting fields from this files/events. Any help on how to extract Key-Value pairs from these events would be highly appreciated. One sample event is given below. Thank you so much.

 

2022-07-15 12:44:03 - {

    "type" : "TEST",

    "r/o" : false,

    "booting" : false,

    "version" : "6.2.7.TS",

    "user" : "DS",

    "domainUUID" : null,

    "access" : "NATIVE",

    "remote-address" : "localhost",

    "success" : true,

    "ops" : [{

        "address" : [

            {

                "subsystem" : "datasources"

            },

            {

                "data-source" : "mode_tp"

            }

        ],

 

"address" : [

                {

                    "cservice" : "management"

                },

                {

                    "access" : "identity"

                }

            ],

            "DSdomain" : "TESTDomain"

        },

        {

            "address" : [

                {

                    "cservice" : "management"

                },

 

{

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "TEST"

                },

                {

                    "clocal" : "passivation"

                },

                {

                    "store" : "file"

                }

            ],

            "passivation" : true,

            "purge" : false

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "TEST"

                }

            ],

            "module" : "dshibernate"

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "hibernate"

                },

                {

                    "clocal" : "entity"

                }

            ]

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "hibernate"

                },

                {

                    "clocal" : "entity"

                },

                {

                    "component" : "transaction"

                }

            ],

            "model" : "DSTEST"

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "infit"

                },

                {

                    "bucket" : "hibernate"

                },

                {

                    "clocal" : "entity"

                },

                {

                    "memory" : "object"

                }

            ],

            "size" : 210000

        },

 

{

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "DS"

                },

                {

                    "workplace" : "default"

                },

                {

                    "running-spin" : "default"

                }

            ],

            "Test-threads" : 45,

            "queue-length" : 60,

            "max-threads" : 70,

            "keepalive-time" : {

                "time" : 20,

                "unit" : "SECONDS"

            }

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "DS"

                },

                {

                    "workplace" : "default"

                },

                {

                    "long-running-threads" : "default"

                }

            ],

            "Test-threads" : 45,

            "queue-length" : 70,

            "max-threads" : 70,

            "keepalive-time" : {

                "time" : 20,

                "unit" : "SECONDS"

            }

        },

 

    }]

}

Labels (2)
Tags (1)
0 Karma
1 Solution

gcusello
Legend

Hi @SplunkDash,

I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

Ciao.

Giuseppe

View solution in original post

martinpu
Communicator

Hi, you can use 
https://docs.splunk.com/Documentation/SCS/current/SearchReference/JSONFunctions 

Or use the spath command like so:


| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath

https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath

If you just want to extract specific key-value pairs:
Create a macro with 1 argument ( arg1) in advanced search

|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""

Name this macro extractJsonField

And use the macro in the query to extract the interesting field:
`extractJsonField(max-threads)`
`extractJsonField(queue-length)`

max_match=0 ensures you extract each value that corresponds to your argument, even if it appears multiple times.

Edit:

I see there is some differences in formatting in your json so the macro could instead be:

|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""
|rex field=_raw max_match=0 "\"$arg1$":(?<$arg1$>[^\n]+),"

gcusello
Legend

Hi @SplunkDash,

try "spath" command (https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath).

if it doesn't work for you, you can use a regex and the "rex" command.

tell me if you need help with the regex

Ciao.

Giuseppe

SplunkDash
Builder

@gcusello 

Thank you so much for your quick reply, appreciated.

Spath is not working for me. It would be great helpful if you help me on rex. Thank you so much again.

0 Karma

gcusello
Legend

Hi @SplunkDash,

I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

Ciao.

Giuseppe

SplunkDash
Builder

Hello @gcusello ,

Thank you so much again. I tried spath the way @martinpu mentioned, but not working. Is there anything I am missing?

But, your command is working to extract  single field as you also mentioned. I have a number of fields; is there any way, we can use a single rex command (or spath) to extract all fields. I need to implement this extraction/ex in my "inline"  field extraction. Thank you so much again.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@martinpu and @gcusello are correct: spath is the proper way to handle structured data such as JSON.  You just need to first make a field with just JSON if _raw contains non-conformant elements.

I see two alternatives. This one if you don't want to alter _raw.

 

| eval data = mvindex(split(_raw, " - "), 1)
| spath input=data

 

If you can tolerate alteration of _raw,

 

| rex mode=sed "s/.* - //"
| spath

 

 Either way, it gives

_raw_timeaccessbootingdemainUUIDops{}.DSdomain
ops{}.address{}.Test-threads
ops{}.address{}.access
ops{}.address{}.address{}.bucket
ops{}.address{}.address{}.clocal
ops{}.address{}.address{}.componentops{}.address{}.address{}.long-running-threadsops{}.address{}.address{}.memoryops{}.address{}.address{}.running-spinops{}.address{}.address{}.storeops{}.address{}.address{}.subsystemops{}.address{}.address{}.workplaceops{}.address{}.cserviceops{}.address{}.data-sourceops{}.address{}.keepalive-time.timeops{}.address{}.keepalive-time.unitops{}.address{}.max-threadsops{}.address{}.modelops{}.address{}.moduleops{}.address{}.operationops{}.address{}.passivationops{}.address{}.purgeops{}.address{}.queue-lengthops{}.address{}.sizeops{}.address{}.subsystemr/oremote-addresssuccesstypeuserversion
2022-07-15 12:44:03 - { "type" : "TEST", "r/o" : false, "booting" : false, "version" : "6.2.7.TS", "user" : "DS", "domainUUID" : null, "access" : "NATIVE", "remote-address" : "localhost", "success" : true, "ops" : [{ "address" : [ { "subsystem" : "datasources" }, { "data-source" : "mode_tp" } ], "address" : [ { "cservice" : "management" }, { "access" : "identity" } ], "DSdomain" : "TESTDomain" }, { "address" : [ { "cservice" : "management" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" }, { "clocal" : "passivation" }, { "store" : "file" } ], "passivation" : true, "purge" : false }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" } ], "module" : "dshibernate" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" } ] }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "component" : "transaction" } ], "model" : "DSTEST" }, { "operation" : "add", "address" : [ { "subsystem" : "infit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "memory" : "object" } ], "size" : 210000 }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "running-spin" : "default" } ], "Test-threads" : 45, "queue-length" : 60, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "long-running-threads" : "default" } ], "Test-threads" : 45, "queue-length" : 70, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, }] }2022-07-15 12:44:03NATIVEfalsenullTESTDomain
45
45
identity
TEST
TEST
hibernate
hibernate
hibernate
passivation
entity
entity
entity
transactiondefault                        
Tags (1)
0 Karma

SplunkDash
Builder

Hello @yuanliu,

Thank you so much for your response. It's working  as well. What you, @gcusello, and @martinpu are provided working good to extract fields from this JSON formatted events.  But the main challenge as my objective is to see all extracted fields from this ingested JSON events by typing only index=INDEX_NAME and sourcetype=SOURCETYPE_NAME. How would I do that, as spath doesn't work for INLINE extraction and it is hard to extract that many fields using this:.....thank you so much, any recommendation would be highly appreciated.

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

[@gcusello ] 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, by INLINE extraction, you mean automatic extraction. (Whether automatic extraction is defined with inline regex or using a predefined transformation should bear no significance.)  Practically, if the JSON structure is completely preformatted, you can possibly build a gigantic regex to extract all fields - which will be a ginormous amount of work, and even greater effort to maintain in the future.  The big problem with treating structured data as pure text is that their form can change without semantic change, and your regex will no longer work even though they represent exactly the same data.

So, I would suggest that you go back to developers and ask them to place timestamp as a distinct field INSIDE the JSON, so you can just define your source as of type JSON and let Splunk do the job for you.

 

martinpu
Communicator
| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath

There was a missing : in the extraction, try this one 🙂 

SplunkDash
Builder

@martinpu 

😉....oops ....yes working. thank you so much. Now main challenge is to extract these fields using INLINE, as @gcusello spath cannot be implemented in INLINE, any thoughts? 

0 Karma

martinpu
Communicator

gcusello
Legend

Hi @SplunkDash,

the only way to extract all fields using one command  is spath, so I hint to try again, maybe your json file has a non standard part to remove and after you'll be able to use spath.

To extract all the fields using regexes, you have to create many regexes and it is an hard work.

Ciao.

Giuseppe

martinpu
Communicator

The issue with the logs is the timestamp in front, as 
|spath requires:

{  at start

} and end

with the addition of "timestamp  - " it is mixed data so that part would have to removed

SplunkDash
Builder

@gcusello 

Thank you so much again.

Can spath be implemented in my "INLINE" extraction (SETTING--->Fields--->Field extractions)? Thank you!

0 Karma

gcusello
Legend

Hi @SplunkDash,

if one answer solves your need, please accept one answer for the other people of Community or tell us how we can help you.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors;-)

gcusello
Legend

Hi @SplunkDash,

no, spath is a command to use in search, eventually in a macro, but it's easy to use by itself.

Ciao.

Giuseppe

SplunkDash
Builder

@gcusello 

So, then I need to go with regex to implement it through INLINE, is there any other recommendation, as I have another JSON ingested events which has more than 40 Key:Values pairs, using regex for those events is really hard. Thank you so much again.

0 Karma

gcusello
Legend

Hi @SplunkDash,

you don't ned to extract inline, you could also extract and save extraction, but try to use spath.

Ciao.

Giuseppe

0 Karma

SplunkDash
Builder

Hello @gcusello,

How can I save the extraction for later use without defining it as inline or transformation or macro?  

0 Karma

gcusello
Legend

Hi @SplunkDash,

save it as a new field extraction.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...

Ready, Set, SOAR: How Utility Apps Can Up Level Your Playbooks!

 WATCH NOW Powering your capabilities has never been so easy with ready-made Splunk® SOAR Utility Apps. Parse ...

DevSecOps: Why You Should Care and How To Get Started

 WATCH NOW In this Tech Talk we will talk about what people mean by DevSecOps and deep dive into the different ...