Hello there,
I am stuck with a dynamic field name extraction.
The data is partly JSON and sometimes contains nested JSON in the JSON part:
log-group=abc [2019-05-12 12:23:16,074] - INFO - {"time": "2019-05-12T12:23:16Z", "step": "PRE_REQUEST", "uuid": "abcxyz", "method": "GET", "ip_src": "1.2.3.4", "url": "https://api/abc", "url_params": {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}, "user": "john"}
I am trying to extract each element of the nested 'url_params'.
To achieve this, I extract url_params as a JSON event and then I extract each of its field/value using dynamic field naming.
1st step - extracting url_params:
url_params_extract = {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}
2nd step - extracting each element:
name = aaa
reliability = 90
equipment_name = bbb
element_name= ccc
The configuration files look like this:
transforms.conf
[url_params]
FORMAT = url_params_extract::$1
REGEX = url_params\"\:\s(\{.*?\})\,
[url_params_extract]
FORMAT = $1::$2
REGEX = \"(.+?)\"\:\s\"(.+?)\"
SOURCE_KEY = url_params_extract
props.conf
[test]
REPORT-url_params = url_params
REPORT-url_params_extract = url_params_extract
EVAL-url_params = null
EVAL-url_params_extract = nullif(url_params_extract, "{}")
The problem is each last element comes out with a closing curly bracket.
For instance
url_params_extract = {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}
Result:
element_name = ccc"}
Instead of desired:
element_name = ccc
Despite the regex being tested OK on regex101
Even more weird, if I do extract the nested JSON without curly braces, the issue remains:
url_params_extract = "name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"
I would still have:
element_name = ccc"}
Unfortunately, I am not able to reproduce the issue with this sample event, I am still try to figure out why.
But I am starting to think that I am missing something on '$1::$2' format usage.
Any hint ?
Give this a try (updates to this transforms.conf entry, rest all will remain same)
[url_params_extract]
FORMAT = $1::$2
REGEX = \"(.+?)\"\:\s\"([^\"\}]+)\"
SOURCE_KEY = url_params_extract
Give this a try (updates to this transforms.conf entry, rest all will remain same)
[url_params_extract]
FORMAT = $1::$2
REGEX = \"(.+?)\"\:\s\"([^\"\}]+)\"
SOURCE_KEY = url_params_extract
Thanks a lot!
I had already tried something the like so it did not resolve it directly but it helped me put the finger on what was wrong!
What was wrong was the extraction in place for the whole json part:
{"time": "2019-05-12T12:23:16Z", "step": "PRE_REQUEST", "uuid": "abcxyz", "method": "GET", "ip_src": "1.2.3.4", "url": "https://api/abc", "url_params": {"name": "aaa", "reliability": "90", "equipment_name": "bbb", "element_name": "ccc"}, "user": "john"}
It extracted the element in question - element_name = ccc"} - which was not overridden by what was executed after like I believe it would.
It even turned out that fixing the extraction following your suggestion allowed to get rid of the need to extract 'url_params' independently!
Thanks again,