Splunk Search

How to extract from complex JSON file?

SplunkDash
Motivator

Hello,

I have complex JSON events ingested as *.log files. I have issues (or couldn't do) with extracting fields from this files/events. Any help on how to extract Key-Value pairs from these events would be highly appreciated. One sample event is given below. Thank you so much.

 

2022-07-15 12:44:03 - {

    "type" : "TEST",

    "r/o" : false,

    "booting" : false,

    "version" : "6.2.7.TS",

    "user" : "DS",

    "domainUUID" : null,

    "access" : "NATIVE",

    "remote-address" : "localhost",

    "success" : true,

    "ops" : [{

        "address" : [

            {

                "subsystem" : "datasources"

            },

            {

                "data-source" : "mode_tp"

            }

        ],

 

"address" : [

                {

                    "cservice" : "management"

                },

                {

                    "access" : "identity"

                }

            ],

            "DSdomain" : "TESTDomain"

        },

        {

            "address" : [

                {

                    "cservice" : "management"

                },

 

{

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "TEST"

                },

                {

                    "clocal" : "passivation"

                },

                {

                    "store" : "file"

                }

            ],

            "passivation" : true,

            "purge" : false

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "TEST"

                }

            ],

            "module" : "dshibernate"

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "hibernate"

                },

                {

                    "clocal" : "entity"

                }

            ]

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "finit"

                },

                {

                    "bucket" : "hibernate"

                },

                {

                    "clocal" : "entity"

                },

                {

                    "component" : "transaction"

                }

            ],

            "model" : "DSTEST"

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "infit"

                },

                {

                    "bucket" : "hibernate"

                },

                {

                    "clocal" : "entity"

                },

                {

                    "memory" : "object"

                }

            ],

            "size" : 210000

        },

 

{

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "DS"

                },

                {

                    "workplace" : "default"

                },

                {

                    "running-spin" : "default"

                }

            ],

            "Test-threads" : 45,

            "queue-length" : 60,

            "max-threads" : 70,

            "keepalive-time" : {

                "time" : 20,

                "unit" : "SECONDS"

            }

        },

        {

            "operation" : "add",

            "address" : [

                {

                    "subsystem" : "DS"

                },

                {

                    "workplace" : "default"

                },

                {

                    "long-running-threads" : "default"

                }

            ],

            "Test-threads" : 45,

            "queue-length" : 70,

            "max-threads" : 70,

            "keepalive-time" : {

                "time" : 20,

                "unit" : "SECONDS"

            }

        },

 

    }]

}

Labels (2)
Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

Ciao.

Giuseppe

View solution in original post

martinpu
Communicator

Hi, you can use 
https://docs.splunk.com/Documentation/SCS/current/SearchReference/JSONFunctions 

Or use the spath command like so:


| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath

https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath

If you just want to extract specific key-value pairs:
Create a macro with 1 argument ( arg1) in advanced search

|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""

Name this macro extractJsonField

And use the macro in the query to extract the interesting field:
`extractJsonField(max-threads)`
`extractJsonField(queue-length)`

max_match=0 ensures you extract each value that corresponds to your argument, even if it appears multiple times.

Edit:

I see there is some differences in formatting in your json so the macro could instead be:

|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""
|rex field=_raw max_match=0 "\"$arg1$":(?<$arg1$>[^\n]+),"

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

try "spath" command (https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath).

if it doesn't work for you, you can use a regex and the "rex" command.

tell me if you need help with the regex

Ciao.

Giuseppe

SplunkDash
Motivator

Hello @gcusello,

Hope all is well. I have a quick question. are there any ways we can find a specific index name that was used within which App? The reason I am asking since we have a number of apps, but I forgot which apps I used for index wincbs? Thank you so much in advance for your support in these efforts.   

 

 

Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

as I said, do not add a new question to another, especially if it is already closed because it is very difficult to get an answer.

To find the app where an index is created, if you have a stand alone installation, you can see in the [Settings > Indexes] section.

If you have a distributed architecture, indexes and apps aren't directly related.

To find this, you could see which sourcetypes yu have in this index and then see in the [Settings > sourcetypes] section the related app.

Ciao.

Giuseppe

SplunkDash
Motivator

@gcusello 

I posted a new question.

 How do I Compare field values between Lookup Table... - Splunk Community

Your recommendation will be highly appreciated if you get a chance. Thank you so much

Tags (1)
0 Karma

SplunkDash
Motivator

@gcusello 

Thank you so much for your quick reply, appreciated.

Spath is not working for me. It would be great helpful if you help me on rex. Thank you so much again.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

Ciao.

Giuseppe

SplunkDash
Motivator

Hello @gcusello ,

Thank you so much again. I tried spath the way @martinpu mentioned, but not working. Is there anything I am missing?

But, your command is working to extract  single field as you also mentioned. I have a number of fields; is there any way, we can use a single rex command (or spath) to extract all fields. I need to implement this extraction/ex in my "inline"  field extraction. Thank you so much again.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@martinpu and @gcusello are correct: spath is the proper way to handle structured data such as JSON.  You just need to first make a field with just JSON if _raw contains non-conformant elements.

I see two alternatives. This one if you don't want to alter _raw.

 

| eval data = mvindex(split(_raw, " - "), 1)
| spath input=data

 

If you can tolerate alteration of _raw,

 

| rex mode=sed "s/.* - //"
| spath

 

 Either way, it gives

_raw_timeaccessbootingdemainUUIDops{}.DSdomain
ops{}.address{}.Test-threads
ops{}.address{}.access
ops{}.address{}.address{}.bucket
ops{}.address{}.address{}.clocal
ops{}.address{}.address{}.componentops{}.address{}.address{}.long-running-threadsops{}.address{}.address{}.memoryops{}.address{}.address{}.running-spinops{}.address{}.address{}.storeops{}.address{}.address{}.subsystemops{}.address{}.address{}.workplaceops{}.address{}.cserviceops{}.address{}.data-sourceops{}.address{}.keepalive-time.timeops{}.address{}.keepalive-time.unitops{}.address{}.max-threadsops{}.address{}.modelops{}.address{}.moduleops{}.address{}.operationops{}.address{}.passivationops{}.address{}.purgeops{}.address{}.queue-lengthops{}.address{}.sizeops{}.address{}.subsystemr/oremote-addresssuccesstypeuserversion
2022-07-15 12:44:03 - { "type" : "TEST", "r/o" : false, "booting" : false, "version" : "6.2.7.TS", "user" : "DS", "domainUUID" : null, "access" : "NATIVE", "remote-address" : "localhost", "success" : true, "ops" : [{ "address" : [ { "subsystem" : "datasources" }, { "data-source" : "mode_tp" } ], "address" : [ { "cservice" : "management" }, { "access" : "identity" } ], "DSdomain" : "TESTDomain" }, { "address" : [ { "cservice" : "management" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" }, { "clocal" : "passivation" }, { "store" : "file" } ], "passivation" : true, "purge" : false }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" } ], "module" : "dshibernate" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" } ] }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "component" : "transaction" } ], "model" : "DSTEST" }, { "operation" : "add", "address" : [ { "subsystem" : "infit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "memory" : "object" } ], "size" : 210000 }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "running-spin" : "default" } ], "Test-threads" : 45, "queue-length" : 60, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "long-running-threads" : "default" } ], "Test-threads" : 45, "queue-length" : 70, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, }] }2022-07-15 12:44:03NATIVEfalsenullTESTDomain
45
45
identity
TEST
TEST
hibernate
hibernate
hibernate
passivation
entity
entity
entity
transactiondefault                        
Tags (1)
0 Karma

SplunkDash
Motivator

Hello @yuanliu,

Thank you so much for your response. It's working  as well. What you, @gcusello, and @martinpu are provided working good to extract fields from this JSON formatted events.  But the main challenge as my objective is to see all extracted fields from this ingested JSON events by typing only index=INDEX_NAME and sourcetype=SOURCETYPE_NAME. How would I do that, as spath doesn't work for INLINE extraction and it is hard to extract that many fields using this:.....thank you so much, any recommendation would be highly appreciated.

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

[@gcusello ] 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, by INLINE extraction, you mean automatic extraction. (Whether automatic extraction is defined with inline regex or using a predefined transformation should bear no significance.)  Practically, if the JSON structure is completely preformatted, you can possibly build a gigantic regex to extract all fields - which will be a ginormous amount of work, and even greater effort to maintain in the future.  The big problem with treating structured data as pure text is that their form can change without semantic change, and your regex will no longer work even though they represent exactly the same data.

So, I would suggest that you go back to developers and ask them to place timestamp as a distinct field INSIDE the JSON, so you can just define your source as of type JSON and let Splunk do the job for you.

 

martinpu
Communicator
| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath

There was a missing : in the extraction, try this one 🙂 

SplunkDash
Motivator

@martinpu 

😉....oops ....yes working. thank you so much. Now main challenge is to extract these fields using INLINE, as @gcusello spath cannot be implemented in INLINE, any thoughts? 

0 Karma

martinpu
Communicator

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

the only way to extract all fields using one command  is spath, so I hint to try again, maybe your json file has a non standard part to remove and after you'll be able to use spath.

To extract all the fields using regexes, you have to create many regexes and it is an hard work.

Ciao.

Giuseppe

0 Karma

martinpu
Communicator

The issue with the logs is the timestamp in front, as 
|spath requires:

{  at start

} and end

with the addition of "timestamp  - " it is mixed data so that part would have to removed

SplunkDash
Motivator

@gcusello 

Thank you so much again.

Can spath be implemented in my "INLINE" extraction (SETTING--->Fields--->Field extractions)? Thank you!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

if one answer solves your need, please accept one answer for the other people of Community or tell us how we can help you.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors;-)

gcusello
SplunkTrust
SplunkTrust

Hi @SplunkDash,

no, spath is a command to use in search, eventually in a macro, but it's easy to use by itself.

Ciao.

Giuseppe

SplunkDash
Motivator

@gcusello 

So, then I need to go with regex to implement it through INLINE, is there any other recommendation, as I have another JSON ingested events which has more than 40 Key:Values pairs, using regex for those events is really hard. Thank you so much again.

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...