Splunk Search

Extraction issue with dynamic field names

D2SI
Communicator

Hello there,

I am stuck with a dynamic field name extraction.

The data is partly JSON and sometimes contains nested JSON in the JSON part:

log-group=abc [2019-05-12 12:23:16,074] - INFO - {"time": "2019-05-12T12:23:16Z", "step": "PRE_REQUEST", "uuid": "abcxyz", "method": "GET", "ip_src": "1.2.3.4", "url": "https://api/abc", "url_params": {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}, "user": "john"}

I am trying to extract each element of the nested 'url_params'.

To achieve this, I extract url_params as a JSON event and then I extract each of its field/value using dynamic field naming.

1st step - extracting url_params:

url_params_extract = {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}

2nd step - extracting each element:

name = aaa
reliability = 90
equipment_name = bbb
element_name= ccc

The configuration files look like this:

transforms.conf

[url_params]
FORMAT = url_params_extract::$1
REGEX = url_params\"\:\s(\{.*?\})\,

[url_params_extract]
FORMAT = $1::$2
REGEX = \"(.+?)\"\:\s\"(.+?)\"
SOURCE_KEY = url_params_extract

props.conf

[test]
REPORT-url_params = url_params
REPORT-url_params_extract = url_params_extract

EVAL-url_params = null
EVAL-url_params_extract = nullif(url_params_extract, "{}")

The problem is each last element comes out with a closing curly bracket.

For instance

url_params_extract = {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}

Result:

element_name = ccc"} 

Instead of desired:

element_name = ccc

Despite the regex being tested OK on regex101

Even more weird, if I do extract the nested JSON without curly braces, the issue remains:

url_params_extract = "name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"

I would still have:

element_name = ccc"}

Unfortunately, I am not able to reproduce the issue with this sample event, I am still try to figure out why.

But I am starting to think that I am missing something on '$1::$2' format usage.

Any hint ?

0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try (updates to this transforms.conf entry, rest all will remain same)

[url_params_extract]
 FORMAT = $1::$2
 REGEX = \"(.+?)\"\:\s\"([^\"\}]+)\"
 SOURCE_KEY = url_params_extract

View solution in original post

somesoni2
Revered Legend

Give this a try (updates to this transforms.conf entry, rest all will remain same)

[url_params_extract]
 FORMAT = $1::$2
 REGEX = \"(.+?)\"\:\s\"([^\"\}]+)\"
 SOURCE_KEY = url_params_extract

D2SI
Communicator

Thanks a lot!

I had already tried something the like so it did not resolve it directly but it helped me put the finger on what was wrong!

What was wrong was the extraction in place for the whole json part:

{"time": "2019-05-12T12:23:16Z", "step": "PRE_REQUEST", "uuid": "abcxyz", "method": "GET", "ip_src": "1.2.3.4", "url": "https://api/abc", "url_params": {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}, "user": "john"}

It extracted the element in question - element_name = ccc"} - which was not overridden by what was executed after like I believe it would.

It even turned out that fixing the extraction following your suggestion allowed to get rid of the need to extract 'url_params' independently!

Thanks again,

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...