Splunk Search

Extraction issue with dynamic field names


Hello there,

I am stuck with a dynamic field name extraction.

The data is partly JSON and sometimes contains nested JSON in the JSON part:

log-group=abc [2019-05-12 12:23:16,074] - INFO - {"time": "2019-05-12T12:23:16Z", "step": "PRE_REQUEST", "uuid": "abcxyz", "method": "GET", "ip_src": "", "url": "https://api/abc", "url_params": {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}, "user": "john"}

I am trying to extract each element of the nested 'url_params'.

To achieve this, I extract url_params as a JSON event and then I extract each of its field/value using dynamic field naming.

1st step - extracting url_params:

url_params_extract = {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}

2nd step - extracting each element:

name = aaa
reliability = 90
equipment_name = bbb
element_name= ccc

The configuration files look like this:


FORMAT = url_params_extract::$1
REGEX = url_params\"\:\s(\{.*?\})\,

FORMAT = $1::$2
REGEX = \"(.+?)\"\:\s\"(.+?)\"
SOURCE_KEY = url_params_extract


REPORT-url_params = url_params
REPORT-url_params_extract = url_params_extract

EVAL-url_params = null
EVAL-url_params_extract = nullif(url_params_extract, "{}")

The problem is each last element comes out with a closing curly bracket.

For instance

url_params_extract = {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}


element_name = ccc"} 

Instead of desired:

element_name = ccc

Despite the regex being tested OK on regex101

Even more weird, if I do extract the nested JSON without curly braces, the issue remains:

url_params_extract = "name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"

I would still have:

element_name = ccc"}

Unfortunately, I am not able to reproduce the issue with this sample event, I am still try to figure out why.

But I am starting to think that I am missing something on '$1::$2' format usage.

Any hint ?

0 Karma

Re: Extraction issue with dynamic field names


Give this a try (updates to this transforms.conf entry, rest all will remain same)

 FORMAT = $1::$2
 REGEX = \"(.+?)\"\:\s\"([^\"\}]+)\"
 SOURCE_KEY = url_params_extract

View solution in original post


Re: Extraction issue with dynamic field names


Thanks a lot!

I had already tried something the like so it did not resolve it directly but it helped me put the finger on what was wrong!

What was wrong was the extraction in place for the whole json part:

{"time": "2019-05-12T12:23:16Z", "step": "PRE_REQUEST", "uuid": "abcxyz", "method": "GET", "ip_src": "", "url": "https://api/abc", "url_params": {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}, "user": "john"}

It extracted the element in question - element_name = ccc"} - which was not overridden by what was executed after like I believe it would.

It even turned out that fixing the extraction following your suggestion allowed to get rid of the need to extract 'url_params' independently!

Thanks again,

0 Karma