Getting Data In

How to do search time extractions of nested json objects with props/transforms

jimbaxtermdi
Explorer

We have users migrating apps (that were using Universal Forwarders) to docker containers. The Splunk logging driver for docker embeds the logged json items inside a 'line' object as per the sanitized example below; these fields are not nested in 'line' when using a UF. There are a number of reports/dashboards/alerts built that won't work with the new logging solution because they're not expecting to have to reference a field with 'line.' - for example, line.port instead of just "port". The desired goal is to extract the json fields out of 'line' and place them back in _raw so the reports/dashboards will work with either implementation.

Example (simplified) event:

{"line":{"_t":"2020-03-27T03:17:25.491296Z","logger":"some.logger","level":"INFO","env":"dev","port":"8000","process_id":51,"thread_id":140005384098624,"hostname":"964619888c0d"},"source":"stdout","tag":"some.instance.tag"}

I'm trying to build a props/transforms solution that extracts the json out of 'line' and places those fields back at the '_raw' event level. Here's what I have so far:

local.meta
[]
export = system

props.conf
[docker_line_extract]
REPORT-line = extract_line_object, extract_line_objects

transforms.conf
[extract_line_object]
REGEX = {\"line\":{(?.*)},

[extract_line_objects]
REGEX = \"(?<_KEY_1>[^="\]+)\":\s?\"?(?<_VAL_1>[^="\]*)
FORMAT = $1::$2
SOURCE_KEY = field:lineobj
DEST_KEY = _raw
REPEAT_MATCH = true

The above succeeds in extracting the json field/values out of 'line' - the 'lineobj' field appears in the fields list in Splunk Web; clicking one reveals the expected content: "_t":"2020-03-27T03:17:25.491296Z","logger":"some.logger","level":"INFO","env":"dev","port":"8000","process_id":51,"thread_id":140005384098624,"hostname":"964619888c0d"

So that part is working. But I can't seem to get the json field/values extracted out of 'lineobj' and placed in the _raw event as desired - tried a lot of variations, no luck. Does anyone have some insights / solution? Thank you.

0 Karma
1 Solution

jimbaxtermdi
Explorer

Figured it out - I was trying too hard. These entries in a props.conf, applied to events with sourcetype 'docker_line_extract' solved the problem. Still see the line. object in the search results, but the extracted fields appear in the fields list, and you can reference and/or build filters with the extracted fields as desired:

[docker_line_extract]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

Evidently leveraging the _KEY_1 and _VAL_1 convention will extract all the field/value pairs nested in the 'line' object.
The (\{\"line\":\{)? part at the start of the regex eliminates trying to extract and include an empty "line" field as well.

View solution in original post

0 Karma

to4kawa
Ultra Champion
| makeresults
| eval _raw="{\"line\":{\"_t\":\"2020-03-27T03:17:25.491296Z\",\"logger\":\"some.logger\",\"level\":\"INFO\",\"env\":\"dev\",\"port\":\"8000\",\"process_id\":51,\"thread_id\":140005384098624,\"hostname\":\"964619888c0d\"},\"source\":\"stdout\",\"tag\":\"some.instance.tag\"}"
| rex mode=sed "s/{.*({.*}).*}/\1/"
| spath

spath works.
props.conf

[docker_line_extract]
SEDCMD-trim_line = s/{.*({.*}).*}/\1/
KV_MODE = json

you can use KV_MODE , not need other field extraction.

0 Karma

jimbaxtermdi
Explorer

Figured it out - I was trying too hard. These entries in a props.conf, applied to events with sourcetype 'docker_line_extract' solved the problem. Still see the line. object in the search results, but the extracted fields appear in the fields list, and you can reference and/or build filters with the extracted fields as desired:

[docker_line_extract]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

Evidently leveraging the _KEY_1 and _VAL_1 convention will extract all the field/value pairs nested in the 'line' object.
The (\{\"line\":\{)? part at the start of the regex eliminates trying to extract and include an empty "line" field as well.

0 Karma

jimbaxtermdi
Explorer

And, of course you can add this under a [default] stanza in props.conf if you need it to be applied to any sourcetype:

[default]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

[other_sourcetypes]
...
0 Karma

jimbaxtermdi
Explorer

Correction to transforms.conf - should have used the code block the first time - apologies:

[extract_line_object]
REGEX = \{\"line\":\{(?<lineobj>.*)\},

[extract_line_objects]
REGEX = \"(?<_KEY_1>[^="\\]+)\":\s?\"?(?<_VAL_1>[^="\\]*)
FORMAT = $1::$2
SOURCE_KEY = field:lineobj
DEST_KEY = _raw
REPEAT_MATCH = true
0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...