Hello,
I have complex JSON events ingested as *.log files. I have issues (or couldn't do) with extracting fields from this files/events. Any help on how to extract Key-Value pairs from these events would be highly appreciated. One sample event is given below. Thank you so much.
2022-07-15 12:44:03 - {
"type" : "TEST",
"r/o" : false,
"booting" : false,
"version" : "6.2.7.TS",
"user" : "DS",
"domainUUID" : null,
"access" : "NATIVE",
"remote-address" : "localhost",
"success" : true,
"ops" : [{
"address" : [
{
"subsystem" : "datasources"
},
{
"data-source" : "mode_tp"
}
],
"address" : [
{
"cservice" : "management"
},
{
"access" : "identity"
}
],
"DSdomain" : "TESTDomain"
},
{
"address" : [
{
"cservice" : "management"
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "finit"
},
{
"bucket" : "TEST"
},
{
"clocal" : "passivation"
},
{
"store" : "file"
}
],
"passivation" : true,
"purge" : false
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "finit"
},
{
"bucket" : "TEST"
}
],
"module" : "dshibernate"
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "finit"
},
{
"bucket" : "hibernate"
},
{
"clocal" : "entity"
}
]
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "finit"
},
{
"bucket" : "hibernate"
},
{
"clocal" : "entity"
},
{
"component" : "transaction"
}
],
"model" : "DSTEST"
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "infit"
},
{
"bucket" : "hibernate"
},
{
"clocal" : "entity"
},
{
"memory" : "object"
}
],
"size" : 210000
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "DS"
},
{
"workplace" : "default"
},
{
"running-spin" : "default"
}
],
"Test-threads" : 45,
"queue-length" : 60,
"max-threads" : 70,
"keepalive-time" : {
"time" : 20,
"unit" : "SECONDS"
}
},
{
"operation" : "add",
"address" : [
{
"subsystem" : "DS"
},
{
"workplace" : "default"
},
{
"long-running-threads" : "default"
}
],
"Test-threads" : 45,
"queue-length" : 70,
"max-threads" : 70,
"keepalive-time" : {
"time" : 20,
"unit" : "SECONDS"
}
},
}]
}
Hi @SplunkDash,
I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:
| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""
Ciao.
Giuseppe
Hi, you can use
https://docs.splunk.com/Documentation/SCS/current/SearchReference/JSONFunctions
Or use the spath command like so:
| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath
https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath
If you just want to extract specific key-value pairs:
Create a macro with 1 argument ( arg1) in advanced search
|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""
Name this macro extractJsonField
And use the macro in the query to extract the interesting field:
`extractJsonField(max-threads)`
`extractJsonField(queue-length)`
max_match=0 ensures you extract each value that corresponds to your argument, even if it appears multiple times.
Edit:
I see there is some differences in formatting in your json so the macro could instead be:
|rex field=_raw max_match=0 "\"$arg1$":\"(?<$arg1$>[^\"]+)\""
|rex field=_raw max_match=0 "\"$arg1$":(?<$arg1$>[^\n]+),"
Hi @SplunkDash,
try "spath" command (https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Spath).
if it doesn't work for you, you can use a regex and the "rex" command.
tell me if you need help with the regex
Ciao.
Giuseppe
Hello @gcusello,
Hope all is well. I have a quick question. are there any ways we can find a specific index name that was used within which App? The reason I am asking since we have a number of apps, but I forgot which apps I used for index wincbs? Thank you so much in advance for your support in these efforts.
Hi @SplunkDash,
as I said, do not add a new question to another, especially if it is already closed because it is very difficult to get an answer.
To find the app where an index is created, if you have a stand alone installation, you can see in the [Settings > Indexes] section.
If you have a distributed architecture, indexes and apps aren't directly related.
To find this, you could see which sourcetypes yu have in this index and then see in the [Settings > sourcetypes] section the related app.
Ciao.
Giuseppe
I posted a new question.
How do I Compare field values between Lookup Table... - Splunk Community
Your recommendation will be highly appreciated if you get a chance. Thank you so much
Thank you so much for your quick reply, appreciated.
Spath is not working for me. It would be great helpful if you help me on rex. Thank you so much again.
Hi @SplunkDash,
I hint to try to use spath, because regexes is a very hard way to extract fields, infact you have to create many extractions for each field, e.g. the following:
| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""
Ciao.
Giuseppe
Hello @gcusello ,
Thank you so much again. I tried spath the way @martinpu mentioned, but not working. Is there anything I am missing?
But, your command is working to extract single field as you also mentioned. I have a number of fields; is there any way, we can use a single rex command (or spath) to extract all fields. I need to implement this extraction/ex in my "inline" field extraction. Thank you so much again.
@martinpu and @gcusello are correct: spath is the proper way to handle structured data such as JSON. You just need to first make a field with just JSON if _raw contains non-conformant elements.
I see two alternatives. This one if you don't want to alter _raw.
| eval data = mvindex(split(_raw, " - "), 1)
| spath input=data
If you can tolerate alteration of _raw,
| rex mode=sed "s/.* - //"
| spath
Either way, it gives
_raw | _time | access | booting | demainUUID | ops{}.DSdomain | ops{}.address{}.Test-threads | ops{}.address{}.access | ops{}.address{}.address{}.bucket | ops{}.address{}.address{}.clocal | ops{}.address{}.address{}.component | ops{}.address{}.address{}.long-running-threads | ops{}.address{}.address{}.memory | ops{}.address{}.address{}.running-spin | ops{}.address{}.address{}.store | ops{}.address{}.address{}.subsystem | ops{}.address{}.address{}.workplace | ops{}.address{}.cservice | ops{}.address{}.data-source | ops{}.address{}.keepalive-time.time | ops{}.address{}.keepalive-time.unit | ops{}.address{}.max-threads | ops{}.address{}.model | ops{}.address{}.module | ops{}.address{}.operation | ops{}.address{}.passivation | ops{}.address{}.purge | ops{}.address{}.queue-length | ops{}.address{}.size | ops{}.address{}.subsystem | r/o | remote-address | success | type | user | version |
2022-07-15 12:44:03 - { "type" : "TEST", "r/o" : false, "booting" : false, "version" : "6.2.7.TS", "user" : "DS", "domainUUID" : null, "access" : "NATIVE", "remote-address" : "localhost", "success" : true, "ops" : [{ "address" : [ { "subsystem" : "datasources" }, { "data-source" : "mode_tp" } ], "address" : [ { "cservice" : "management" }, { "access" : "identity" } ], "DSdomain" : "TESTDomain" }, { "address" : [ { "cservice" : "management" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" }, { "clocal" : "passivation" }, { "store" : "file" } ], "passivation" : true, "purge" : false }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "TEST" } ], "module" : "dshibernate" }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" } ] }, { "operation" : "add", "address" : [ { "subsystem" : "finit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "component" : "transaction" } ], "model" : "DSTEST" }, { "operation" : "add", "address" : [ { "subsystem" : "infit" }, { "bucket" : "hibernate" }, { "clocal" : "entity" }, { "memory" : "object" } ], "size" : 210000 }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "running-spin" : "default" } ], "Test-threads" : 45, "queue-length" : 60, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, { "operation" : "add", "address" : [ { "subsystem" : "DS" }, { "workplace" : "default" }, { "long-running-threads" : "default" } ], "Test-threads" : 45, "queue-length" : 70, "max-threads" : 70, "keepalive-time" : { "time" : 20, "unit" : "SECONDS" } }, }] } | 2022-07-15 12:44:03 | NATIVE | false | null | TESTDomain | 45 45 | identity | TEST TEST hibernate hibernate hibernate | passivation entity entity entity | transaction | default |
Hello @yuanliu,
Thank you so much for your response. It's working as well. What you, @gcusello, and @martinpu are provided working good to extract fields from this JSON formatted events. But the main challenge as my objective is to see all extracted fields from this ingested JSON events by typing only index=INDEX_NAME and sourcetype=SOURCETYPE_NAME. How would I do that, as spath doesn't work for INLINE extraction and it is hard to extract that many fields using this:.....thank you so much, any recommendation would be highly appreciated.
| rex "\"workplace\" : \"(?<workplace>[^\"]+)\""
[@gcusello ]
So, by INLINE extraction, you mean automatic extraction. (Whether automatic extraction is defined with inline regex or using a predefined transformation should bear no significance.) Practically, if the JSON structure is completely preformatted, you can possibly build a gigantic regex to extract all fields - which will be a ginormous amount of work, and even greater effort to maintain in the future. The big problem with treating structured data as pure text is that their form can change without semantic change, and your regex will no longer work even though they represent exactly the same data.
So, I would suggest that you go back to developers and ask them to place timestamp as a distinct field INSIDE the JSON, so you can just define your source as of type JSON and let Splunk do the job for you.
| rex field=_raw "\d+-\d+-\d+ \d+:\d+:\d+ - (?<_raw>[\S\s]+)"
| spath
There was a missing : in the extraction, try this one 🙂
https://community.splunk.com/t5/Getting-Data-In/Extract-JSON-data-within-the-logs-JSON-mixed-with-un...
You can probably do it with transforms or manually with rex
Hi @SplunkDash,
the only way to extract all fields using one command is spath, so I hint to try again, maybe your json file has a non standard part to remove and after you'll be able to use spath.
To extract all the fields using regexes, you have to create many regexes and it is an hard work.
Ciao.
Giuseppe
The issue with the logs is the timestamp in front, as
|spath requires:
{ at start
} and end
with the addition of "timestamp - " it is mixed data so that part would have to removed
Thank you so much again.
Can spath be implemented in my "INLINE" extraction (SETTING--->Fields--->Field extractions)? Thank you!
Hi @SplunkDash,
if one answer solves your need, please accept one answer for the other people of Community or tell us how we can help you.
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the Contributors;-)
Hi @SplunkDash,
no, spath is a command to use in search, eventually in a macro, but it's easy to use by itself.
Ciao.
Giuseppe
So, then I need to go with regex to implement it through INLINE, is there any other recommendation, as I have another JSON ingested events which has more than 40 Key:Values pairs, using regex for those events is really hard. Thank you so much again.