All Apps and Add-ons

Help parsing JSON embedded in fields from DB Connect input

I'm using DB connect 3.2 on Splunk 8.0.1. I have an input that brings in data from a mysql database. It looks like thins:

2020-03-17 10:29:16.000, f1="10", f2="2020-03-17 10:29:16", f3="something", f4="{"really": ["long"], "and": {"deep": "json"}}", f5="casdca", f6="more"

Field f4 contains about 13k of JSON. When viewing the event right now, f1, f2, f3 come out find. Splunk can't see f4 as containing JSON so it isn't parsed. f5 and f6, which you'd think are parsed right are not. They appear as being part of the value of the JSON f4 field.

What I would like is a way to have f5 and f6 properly parsed as fields and f4 to be expanded fully as their own JSON fields so that I can pull out fields as part of the search. Being a huge bit of JSON search time extraction is preferred.

I've tried setting KV_MODE to JSON and I've tried playing with transforms logic:
[forseti_violations_all_fields]
REGEX = , ([a-zA-Z0-9-_]+)="(.*)"(, |$)
FORMAT = $1::$2
But none of it has worked at all.

Any ideas? Thanks!

0 Karma

I'm not suggesting this is an answer. However, as a workaround I developed a script that will pull from the mysql database, take the already json fields and convert them to python objects, then upload the event via Splunk HEC. The script itself is about 110 lines including comments and is reusable.

0 Karma

Esteemed Legend

Perfect! Click Accept on your answer.

0 Karma

I'm wondering if it is easier to tell DB Connect to just write the event as a JSON blob to HEC instead of using field=value format. Is that possible?

0 Karma

Esteemed Legend

Your RegEx is wrong; try this:

REGEX = ,\s+([a-zA-Z0-9-_]+)="(.*?)"(?=, |$)

See here:
https://regex101.com/r/bV62Ui/1

0 Karma

Thanks for the regex, but it doesn't work in a real example. Looks like the JSON sample above isn't truely indicative of the data. The JSON is much longer and has a lot of quote comma space patterns in it:
2020-03-17 10:29:16.000, f1="10", f2="2020-03-17 10:29:16", f3="something", f4="{"really": ["long"], "and": {"deep": "json", "test": "foo"}}", f5="casdca", f6="more"
Even that is wrong for some reason. In the real data f4 extracts to just '{' and I don't see f5 and f6 getting extracted at all.

I'm wondering if there is a way to tell DB Connect to submit the entire event as JSON instead of field=value.

Thanks, any more ideas?

0 Karma

Ultra Champion
0 Karma