All Apps and Add-ons

Help parsing JSON embedded in fields from DB Connect input

scottsavareseat
Path Finder

I'm using DB connect 3.2 on Splunk 8.0.1. I have an input that brings in data from a mysql database. It looks like thins:

2020-03-17 10:29:16.000, f1="10", f2="2020-03-17 10:29:16", f3="something", f4="{"really": ["long"], "and": {"deep": "json"}}", f5="casdca", f6="more"

Field f4 contains about 13k of JSON. When viewing the event right now, f1, f2, f3 come out find. Splunk can't see f4 as containing JSON so it isn't parsed. f5 and f6, which you'd think are parsed right are not. They appear as being part of the value of the JSON f4 field.

What I would like is a way to have f5 and f6 properly parsed as fields and f4 to be expanded fully as their own JSON fields so that I can pull out fields as part of the search. Being a huge bit of JSON search time extraction is preferred.

I've tried setting KV_MODE to JSON and I've tried playing with transforms logic:
[forseti_violations_all_fields]
REGEX = , ([a-zA-Z0-9-_]+)="(.*)"(, |$)
FORMAT = $1::$2
But none of it has worked at all.

Any ideas? Thanks!

0 Karma
1 Solution

scottsavareseat
Path Finder

I'm not suggesting this is an answer. However, as a workaround I developed a script that will pull from the mysql database, take the already json fields and convert them to python objects, then upload the event via Splunk HEC. The script itself is about 110 lines including comments and is reusable.

View solution in original post

0 Karma

scottsavareseat
Path Finder

I'm not suggesting this is an answer. However, as a workaround I developed a script that will pull from the mysql database, take the already json fields and convert them to python objects, then upload the event via Splunk HEC. The script itself is about 110 lines including comments and is reusable.

0 Karma

woodcock
Esteemed Legend

Perfect! Click Accept on your answer.

0 Karma

scottsavareseat
Path Finder

I'm wondering if it is easier to tell DB Connect to just write the event as a JSON blob to HEC instead of using field=value format. Is that possible?

0 Karma

woodcock
Esteemed Legend

Your RegEx is wrong; try this:

REGEX = ,\s+([a-zA-Z0-9-_]+)="(.*?)"(?=, |$)

See here:
https://regex101.com/r/bV62Ui/1

0 Karma

scottsavareseat
Path Finder

Thanks for the regex, but it doesn't work in a real example. Looks like the JSON sample above isn't truely indicative of the data. The JSON is much longer and has a lot of quote comma space patterns in it:
2020-03-17 10:29:16.000, f1="10", f2="2020-03-17 10:29:16", f3="something", f4="{"really": ["long"], "and": {"deep": "json", "test": "foo"}}", f5="casdca", f6="more"
Even that is wrong for some reason. In the real data f4 extracts to just '{' and I don't see f5 and f6 getting extracted at all.

I'm wondering if there is a way to tell DB Connect to submit the entire event as JSON instead of field=value.

Thanks, any more ideas?

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...