Getting Data In

How to do search time extractions of nested json objects with props/transforms

jimbaxtermdi
Explorer

We have users migrating apps (that were using Universal Forwarders) to docker containers. The Splunk logging driver for docker embeds the logged json items inside a 'line' object as per the sanitized example below; these fields are not nested in 'line' when using a UF. There are a number of reports/dashboards/alerts built that won't work with the new logging solution because they're not expecting to have to reference a field with 'line.' - for example, line.port instead of just "port". The desired goal is to extract the json fields out of 'line' and place them back in _raw so the reports/dashboards will work with either implementation.

Example (simplified) event:

{"line":{"_t":"2020-03-27T03:17:25.491296Z","logger":"some.logger","level":"INFO","env":"dev","port":"8000","process_id":51,"thread_id":140005384098624,"hostname":"964619888c0d"},"source":"stdout","tag":"some.instance.tag"}

I'm trying to build a props/transforms solution that extracts the json out of 'line' and places those fields back at the '_raw' event level. Here's what I have so far:

local.meta
[]
export = system

props.conf
[docker_line_extract]
REPORT-line = extract_line_object, extract_line_objects

transforms.conf
[extract_line_object]
REGEX = {\"line\":{(?.*)},

[extract_line_objects]
REGEX = \"(?<_KEY_1>[^="\]+)\":\s?\"?(?<_VAL_1>[^="\]*)
FORMAT = $1::$2
SOURCE_KEY = field:lineobj
DEST_KEY = _raw
REPEAT_MATCH = true

The above succeeds in extracting the json field/values out of 'line' - the 'lineobj' field appears in the fields list in Splunk Web; clicking one reveals the expected content: "_t":"2020-03-27T03:17:25.491296Z","logger":"some.logger","level":"INFO","env":"dev","port":"8000","process_id":51,"thread_id":140005384098624,"hostname":"964619888c0d"

So that part is working. But I can't seem to get the json field/values extracted out of 'lineobj' and placed in the _raw event as desired - tried a lot of variations, no luck. Does anyone have some insights / solution? Thank you.

0 Karma
1 Solution

jimbaxtermdi
Explorer

Figured it out - I was trying too hard. These entries in a props.conf, applied to events with sourcetype 'docker_line_extract' solved the problem. Still see the line. object in the search results, but the extracted fields appear in the fields list, and you can reference and/or build filters with the extracted fields as desired:

[docker_line_extract]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

Evidently leveraging the _KEY_1 and _VAL_1 convention will extract all the field/value pairs nested in the 'line' object.
The (\{\"line\":\{)? part at the start of the regex eliminates trying to extract and include an empty "line" field as well.

View solution in original post

0 Karma

to4kawa
Ultra Champion
| makeresults
| eval _raw="{\"line\":{\"_t\":\"2020-03-27T03:17:25.491296Z\",\"logger\":\"some.logger\",\"level\":\"INFO\",\"env\":\"dev\",\"port\":\"8000\",\"process_id\":51,\"thread_id\":140005384098624,\"hostname\":\"964619888c0d\"},\"source\":\"stdout\",\"tag\":\"some.instance.tag\"}"
| rex mode=sed "s/{.*({.*}).*}/\1/"
| spath

spath works.
props.conf

[docker_line_extract]
SEDCMD-trim_line = s/{.*({.*}).*}/\1/
KV_MODE = json

you can use KV_MODE , not need other field extraction.

0 Karma

jimbaxtermdi
Explorer

Figured it out - I was trying too hard. These entries in a props.conf, applied to events with sourcetype 'docker_line_extract' solved the problem. Still see the line. object in the search results, but the extracted fields appear in the fields list, and you can reference and/or build filters with the extracted fields as desired:

[docker_line_extract]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

Evidently leveraging the _KEY_1 and _VAL_1 convention will extract all the field/value pairs nested in the 'line' object.
The (\{\"line\":\{)? part at the start of the regex eliminates trying to extract and include an empty "line" field as well.

0 Karma

jimbaxtermdi
Explorer

And, of course you can add this under a [default] stanza in props.conf if you need it to be applied to any sourcetype:

[default]
EXTRACT-line = (\{\"line\":\{)?\"(?<_KEY_1>[^=",]+)\":\s?\"?(?<_VAL_1>[^=",]*)

[other_sourcetypes]
...
0 Karma

jimbaxtermdi
Explorer

Correction to transforms.conf - should have used the code block the first time - apologies:

[extract_line_object]
REGEX = \{\"line\":\{(?<lineobj>.*)\},

[extract_line_objects]
REGEX = \"(?<_KEY_1>[^="\\]+)\":\s?\"?(?<_VAL_1>[^="\\]*)
FORMAT = $1::$2
SOURCE_KEY = field:lineobj
DEST_KEY = _raw
REPEAT_MATCH = true
0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...