Getting Data In
Highlighted

How to do search time extractions of nested json objects with props/transforms

Explorer

We have users migrating apps (that were using Universal Forwarders) to docker containers. The Splunk logging driver for docker embeds the logged json items inside a 'line' object as per the sanitized example below; these fields are not nested in 'line' when using a UF. There are a number of reports/dashboards/alerts built that won't work with the new logging solution because they're not expecting to have to reference a field with 'line.' - for example, line.port instead of just "port". The desired goal is to extract the json fields out of 'line' and place them back in _raw so the reports/dashboards will work with either implementation.

Example (simplified) event:

{"line":{"t":"2020-03-27T03:17:25.491296Z","logger":"some.logger","level":"INFO","env":"dev","port":"8000","processid":51,"thread_id":140005384098624,"hostname":"964619888c0d"},"source":"stdout","tag":"some.instance.tag"}

I'm trying to build a props/transforms solution that extracts the json out of 'line' and places those fields back at the '_raw' event level. Here's what I have so far:

local.meta
[]
export = system

props.conf
[dockerlineextract]
REPORT-line = extractlineobject, extractlineobjects

transforms.conf
[extractlineobject]
REGEX = {\"line\":{(?.*)},

[extractlineobjects]
REGEX = \"(?<KEY1>[^="\]+)\":\s?\"?(?<VAL1>[^="\]*)
FORMAT = $1::$2
SOURCEKEY = field:lineobj
DEST
KEY = raw
REPEAT
MATCH = true

The above succeeds in extracting the json field/values out of 'line' - the 'lineobj' field appears in the fields list in Splunk Web; clicking one reveals the expected content: "t":"2020-03-27T03:17:25.491296Z","logger":"some.logger","level":"INFO","env":"dev","port":"8000","processid":51,"thread_id":140005384098624,"hostname":"964619888c0d"

So that part is working. But I can't seem to get the json field/values extracted out of 'lineobj' and placed in the _raw event as desired - tried a lot of variations, no luck. Does anyone have some insights / solution? Thank you.

0 Karma
Highlighted

Re: How to do search time extractions of nested json objects with props/transforms

Explorer

Correction to transforms.conf - should have used the code block the first time - apologies:

[extract_line_object]
REGEX = \{\"line\":\{(?<lineobj>.*)\},

[extract_line_objects]
REGEX = \"(?<_KEY_1>[^="\\]+)\":\s?\"?(?<_VAL_1>[^="\\]*)
FORMAT = $1::$2
SOURCE_KEY = field:lineobj
DEST_KEY = _raw
REPEAT_MATCH = true
0 Karma
Highlighted

Re: How to do search time extractions of nested json objects with props/transforms

Explorer

Figured it out - I was trying too hard. These entries in a props.conf, applied to events with sourcetype 'dockerlineextract' solved the problem. Still see the line. object in the search results, but the extracted fields appear in the fields list, and you can reference and/or build filters with the extracted fields as desired:

[docker_line_extract]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

Evidently leveraging the KEY1 and VAL1 convention will extract all the field/value pairs nested in the 'line' object.
The (\{\"line\":\{)? part at the start of the regex eliminates trying to extract and include an empty "line" field as well.

View solution in original post

0 Karma
Highlighted

Re: How to do search time extractions of nested json objects with props/transforms

Explorer

And, of course you can add this under a [default] stanza in props.conf if you need it to be applied to any sourcetype:

[default]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

[other_sourcetypes]
...
0 Karma
Highlighted

Re: How to do search time extractions of nested json objects with props/transforms

Ultra Champion
| makeresults
| eval _raw="{\"line\":{\"_t\":\"2020-03-27T03:17:25.491296Z\",\"logger\":\"some.logger\",\"level\":\"INFO\",\"env\":\"dev\",\"port\":\"8000\",\"process_id\":51,\"thread_id\":140005384098624,\"hostname\":\"964619888c0d\"},\"source\":\"stdout\",\"tag\":\"some.instance.tag\"}"
| rex mode=sed "s/{.*({.*}).*}/\1/"
| spath

spath works.
props.conf

[docker_line_extract]
SEDCMD-trim_line = s/{.*({.*}).*}/\1/
KV_MODE = json

you can use KV_MODE , not need other field extraction.

0 Karma