Solved: How to extract from complex JSON file?

SplunkDash · ‎08-05-2022

Hello,

I have complex JSON events ingested as *.log files. I have issues (or couldn't do) with extracting fields from this files/events. Any help on how to extract Key-Value pairs from these events would be highly appreciated. One sample event is given below. Thank you so much.

2022-07-15 12:44:03 - {

"type" : "TEST",

"r/o" : false,

"booting" : false,

"version" : "6.2.7.TS",

"user" : "DS",

"domainUUID" : null,

"access" : "NATIVE",

"remote-address" : "localhost",

"success" : true,

"ops" : [{

"address" : [

{

"subsystem" : "datasources"

},

{

"data-source" : "mode_tp"

}

],

"address" : [

{

"cservice" : "management"

},

{

"access" : "identity"

}

],

"DSdomain" : "TESTDomain"

},

{

"address" : [

{

"cservice" : "management"

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "finit"

},

{

"bucket" : "TEST"

},

{

"clocal" : "passivation"

},

{

"store" : "file"

}

],

"passivation" : true,

"purge" : false

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "finit"

},

{

"bucket" : "TEST"

}

],

"module" : "dshibernate"

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "finit"

},

{

"bucket" : "hibernate"

},

{

"clocal" : "entity"

}

]

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "finit"

},

{

"bucket" : "hibernate"

},

{

"clocal" : "entity"

},

{

"component" : "transaction"

}

],

"model" : "DSTEST"

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "infit"

},

{

"bucket" : "hibernate"

},

{

"clocal" : "entity"

},

{

"memory" : "object"

}

],

"size" : 210000

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "DS"

},

{

"workplace" : "default"

},

{

"running-spin" : "default"

}

],

"Test-threads" : 45,

"queue-length" : 60,

"max-threads" : 70,

"keepalive-time" : {

"time" : 20,

"unit" : "SECONDS"

}

},

{

"operation" : "add",

"address" : [

{

"subsystem" : "DS"

},

{

"workplace" : "default"

},

{

"long-running-threads" : "default"

}

],

"Test-threads" : 45,

"queue-length" : 70,

"max-threads" : 70,

"keepalive-time" : {

"time" : 20,

"unit" : "SECONDS"

}

},

}]

}

gcusello · ‎08-05-2022

Hi @SplunkDash,

I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

Ciao.

Giuseppe

View solution in original post

martinpu · ‎08-05-2022

Hi, you can use
https://docs.splunk.com/Documentation/SCS/current/SearchReference/JSONFunctions

Or use the spath command like so:

| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath

https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath

If you just want to extract specific key-value pairs:
Create a macro with 1 argument ( arg1) in advanced search

|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""

Name this macro extractJsonField

And use the macro in the query to extract the interesting field:
`extractJsonField(max-threads)`
`extractJsonField(queue-length)`

max_match=0 ensures you extract each value that corresponds to your argument, even if it appears multiple times.

Edit:

I see there is some differences in formatting in your json so the macro could instead be:

|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""
|rex field=_raw max_match=0 "\"$arg1$":(?<$arg1$>[^\n]+),"

gcusello · ‎08-05-2022

Hi @SplunkDash,

try "spath" command (https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath).

if it doesn't work for you, you can use a regex and the "rex" command.

tell me if you need help with the regex

Ciao.

Giuseppe

SplunkDash · ‎11-10-2022

Hello @gcusello,

Hope all is well. I have a quick question. are there any ways we can find a specific index name that was used within which App? The reason I am asking since we have a number of apps, but I forgot which apps I used for index wincbs? Thank you so much in advance for your support in these efforts.

gcusello · ‎11-10-2022

Hi @SplunkDash,

as I said, do not add a new question to another, especially if it is already closed because it is very difficult to get an answer.

To find the app where an index is created, if you have a stand alone installation, you can see in the [Settings > Indexes] section.

If you have a distributed architecture, indexes and apps aren't directly related.

To find this, you could see which sourcetypes yu have in this index and then see in the [Settings > sourcetypes] section the related app.

Ciao.

Giuseppe

SplunkDash · ‎03-09-2023

@gcusello

I posted a new question.

How do I Compare field values between Lookup Table... - Splunk Community

Your recommendation will be highly appreciated if you get a chance. Thank you so much

SplunkDash · ‎08-05-2022

@gcusello

Thank you so much for your quick reply, appreciated.

Spath is not working for me. It would be great helpful if you help me on rex. Thank you so much again.

gcusello · ‎08-05-2022

Hi @SplunkDash,

I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

Ciao.

Giuseppe

SplunkDash · ‎08-05-2022

Hello @gcusello ,

Thank you so much again. I tried spath the way @martinpu mentioned, but not working. Is there anything I am missing?

But, your command is working to extract single field as you also mentioned. I have a number of fields; is there any way, we can use a single rex command (or spath) to extract all fields. I need to implement this extraction/ex in my "inline" field extraction. Thank you so much again.

yuanliu · ‎08-05-2022

@martinpu and @gcusello are correct: spath is the proper way to handle structured data such as JSON. You just need to first make a field with just JSON if _raw contains non-conformant elements.

I see two alternatives. This one if you don't want to alter _raw.

| eval data = mvindex(split(_raw, " - "), 1)
| spath input=data

If you can tolerate alteration of _raw,

| rex mode=sed "s/.* - //"
| spath

Either way, it gives

_raw

_time

access

booting

demainUUID

ops{}.DSdomain

ops{}.address{}.Test-threads

ops{}.address{}.access

ops{}.address{}.address{}.bucket

ops{}.address{}.address{}.clocal

ops{}.address{}.address{}.component

ops{}.address{}.address{}.long-running-threads

ops{}.address{}.address{}.memory

ops{}.address{}.address{}.running-spin

ops{}.address{}.address{}.store

ops{}.address{}.address{}.subsystem

ops{}.address{}.address{}.workplace

ops{}.address{}.cservice

ops{}.address{}.data-source

ops{}.address{}.keepalive-time.time

ops{}.address{}.keepalive-time.unit

ops{}.address{}.max-threads

ops{}.address{}.model

ops{}.address{}.module

ops{}.address{}.operation

ops{}.address{}.passivation

ops{}.address{}.purge

ops{}.address{}.queue-length

ops{}.address{}.size

ops{}.address{}.subsystem

r/o

remote-address

success

type

user

version

2022-07-15 12:44:03 - { "type" : "TEST", "r/o" : false, "booting" : false, "version" : "6.2.7.TS", "user" : "DS", "domainUUID" : null, "access" : "NATIVE", "remote-address" : "localhost", "success" : true, "ops" : [{ "address" : [ { "subsystem" : "datasources" }, { "data-source" : "mode_tp" } ], "address" : [ { "cservice" : "management" }, { "access" : "identity" } ], "DSdomain" : "TESTDomain" }, { "address" : [ { "cservice" : "management" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" }, { "clocal" : "passivation" }, { "store" : "file" } ], "passivation" : true, "purge" : false }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" } ], "module" : "dshibernate" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" } ] }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "component" : "transaction" } ], "model" : "DSTEST" }, { "operation" : "add", "address" : [ { "subsystem" : "infit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "memory" : "object" } ], "size" : 210000 }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "running-spin" : "default" } ], "Test-threads" : 45, "queue-length" : 60, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "long-running-threads" : "default" } ], "Test-threads" : 45, "queue-length" : 70, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, }] }

2022-07-15 12:44:03

NATIVE

false

null

TESTDomain

45

identity

TEST

hibernate

passivation

entity

transaction

default

SplunkDash · ‎08-06-2022

Hello @yuanliu,

Thank you so much for your response. It's working as well. What you, @gcusello, and @martinpu are provided working good to extract fields from this JSON formatted events. But the main challenge as my objective is to see all extracted fields from this ingested JSON events by typing only index=INDEX_NAME and sourcetype=SOURCETYPE_NAME. How would I do that, as spath doesn't work for INLINE extraction and it is hard to extract that many fields using this:.....thank you so much, any recommendation would be highly appreciated.

| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""

[@gcusello ]

yuanliu · ‎08-06-2022

So, by INLINE extraction, you mean automatic extraction. (Whether automatic extraction is defined with inline regex or using a predefined transformation should bear no significance.) Practically, if the JSON structure is completely preformatted, you can possibly build a gigantic regex to extract all fields - which will be a ginormous amount of work, and even greater effort to maintain in the future. The big problem with treating structured data as pure text is that their form can change without semantic change, and your regex will no longer work even though they represent exactly the same data.

So, I would suggest that you go back to developers and ask them to place timestamp as a distinct field INSIDE the JSON, so you can just define your source as of type JSON and let Splunk do the job for you.

martinpu · ‎08-05-2022

| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath

There was a missing : in the extraction, try this one 🙂

SplunkDash · ‎08-05-2022

@martinpu

😉....oops ....yes working. thank you so much. Now main challenge is to extract these fields using INLINE, as @gcusello spath cannot be implemented in INLINE, any thoughts?

martinpu · ‎08-05-2022

https://community.splunk.com/t5/Getting-Data-In/Extract-JSON-data-within-the-logs-JSON-mixed-with-un...

You can probably do it with transforms or manually with rex

gcusello · ‎08-05-2022

Hi @SplunkDash,

the only way to extract all fields using one command is spath, so I hint to try again, maybe your json file has a non standard part to remove and after you'll be able to use spath.

To extract all the fields using regexes, you have to create many regexes and it is an hard work.

Ciao.

Giuseppe

martinpu · ‎08-05-2022

The issue with the logs is the timestamp in front, as
|spath requires:

{ at start

} and end

with the addition of "timestamp - " it is mixed data so that part would have to removed

SplunkDash · ‎08-05-2022

@gcusello

Thank you so much again.

Can spath be implemented in my "INLINE" extraction (SETTING--->Fields--->Field extractions)? Thank you!

gcusello · ‎08-05-2022

Hi @SplunkDash,

if one answer solves your need, please accept one answer for the other people of Community or tell us how we can help you.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors;-)

gcusello · ‎08-05-2022

Hi @SplunkDash,

no, spath is a command to use in search, eventually in a macro, but it's easy to use by itself.

Ciao.

Giuseppe

SplunkDash · ‎08-05-2022

@gcusello

So, then I need to go with regex to implement it through INLINE, is there any other recommendation, as I have another JSON ingested events which has more than 40 Key:Values pairs, using regex for those events is really hard. Thank you so much again.

How to extract from complex JSON file?

field extraction

rex

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024