I'm having some troubles parsing data prepended to json logs. I can do it via search, but I'd like to do it upon logging within splunk so I can search the parsed data. Can you point me in the right direction and if I can do this via the UI or need to go into props.conf manually?
This is working via search
sourcetype="Untangle"| rex "(?<json>\{.+)" | spath input=json
What I've tried in props.conf
[untangle]
EXTRACT-untangle=(?<json>\{.+)
Example Log:
Mar 29 01:45:04 _gateway Mar 28 20:45:04 INFO uvm[0]: {"timeStamp":"2022-03-28 20:45:04.762","s2pBytes":160,"p2sBytes":65,"sessionId":107845676257000,"endTime":0,"class":"class com.untangle.uvm.app.SessionStatsEvent","c2pBytes":65,"p2cBytes":160}
I tried below configuration and it's worked for me.
EXTRACT-json = (?P<json>\{.+)
If you need only json to be indexed and make life easy during searching, you can ignore extra text also.
Try below configuration.
[My_Sourcetype]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
SEDCMD-a=s/.*\{/{/g
I hope this will help you.
Thanks
KV
If any of my reply helps you to solve the problem Or gain knowledge, an upvote would be appreciated.
I tried
EXTRACT-json = (?P<json>\{.+)
This does look to remove the beginning portion of the log as I'd like, but does not parse the json. I was hoping that setting the index extractor to json would parse the remaining json log into fields for me as doing the search with spath does.
Well, with splunk you have three different ways to handle json.
1. Indexed-extractions
2. Automatic search-time extractions
3. spath
The first two rely on the event in its entirety being a well-formed json. So they won't work if the event contains additional "headers" or "footers".
If I remember correctly, the indexed extractions are done way before the SEDCMD's in the parsing queue (which makes sense since you can do indexed extractions on Universal Forwarders but can't do SEDCMD's on them), so you can't trim your events with SEDCMD's (or any other transforms) to leave just the json part for indexed extractions. But you can trim your original event and have splunk extract json fields in search-time.
Mind you that the extractions done with each method produce a bit different results in terms of field naming.
So if you have a well-formed json as an input event you can use any of those three options. If you have some extra data in your event you're left with two options:
1. Transform your event prior to indexing so only the well-formed json data is left (effectively losing some part of your original raw data) and use search-time json KV extraction or
2. Leave the event as is and use spath in search time to parse selected part of the event.
Both approaches have their pros and cons so it's up to you.
This will gives you required value in json field. Now just add `| spath input=json` to get values from json..
KV
I was able to take some of the input you provided and get into transforms so now it is all parsing and searchable. Thank you for the help
@splunk:~$ cat /opt/splunk/etc/system/local/props.conf
[Untangle]
KV_MODE = json
TRANSFORMS-untangle = Untangle_transform
@splunk:~$ cat /opt/splunk/etc/system/local/transforms.conf
[Untangle_transform]
SOURCE_KEY = _raw
DEST_KEY = _raw
REGEX = ({.+})
FORMAT = $1
But that's different. You're cutting a part of the event off during ingest. That's modifying the original raw event.
I also looked some time ago to find a way to do auto-kv on part of message (when you have some "header" and after that you get json or xml structure) and didn't find any.