Hello,
I've got a question on getting Splunk to extract key value pairs from syslog json events.
The events look like this:
<14>Mon Aug 12 12:29:29 UTC 2019Info: { //json part}\x00
At first I tried with the standard _json sourcetype. This didnt work. So I tried to make a custom sourcetype that would remove the part before and after the json.
I've tried to add
SEDCMD-end=s/\x00//g
SEDCMD-start=s/^[^{]+//g
KV_mode=json
When I test the sourcetype using the add data wizard in Splunk web, I see the part before the json en after the json dissapear. After I changed the sourcetype to my custom sourcetype in the source of the data, this doesnt work and I still get events with the part before and after the json.
The full sourcetype conf:
ADD_EXTRA_TIME_FIELDS=True
ANNOTATE_PUNCT=true
AUTO_KV_JSON=true
BREAK_ONLY_BEFORE_DATE=true
CHARSET=UTF-8
DEPTH_LIMIT=1000
KV_mode=json
LEARN_MODEL=true
LEARN_SOURCETYPE=true
LINE_BREAKER=([\r\n]+)
LINE_BREAKER_LOOKBEHIND=100
MATCH_LIMIT=100000
MAX_DAYS_AGO=2000
MAX_DAYS_HENCE=2
MAX_DIFF_SECS_AGO=3600
MAX_DIFF_SECS_HENCE=604800
MAX_EVENTS=256
MAX_TIMESTAMP_LOOKAHEAD=128
NO_BINARY_CHECK=true
SEDCMD-end=s/\x00//g
SEDCMD-start=s/^[^{]+//g
SEGMENTATION=indexing
SEGMENTATION-all=full
SEGMENTATION-inner=inner
SEGMENTATION-outer=outer
SEGMENTATION-raw=none
SEGMENTATION-standard=standard
SHOULD_LINEMERGE=true
TRUNCATE=10000
category=Custom
description=Sourcetype voor SAM, dit haalt de extra syslog informatie weg en toont alleen de JSON
detect_trailing_nulls=false
disabled=false
maxDist=100
pulldown_type=true
Extra information:
This gets send to Splunk Cloud from a forwarder that receives this events over a TCP port. On the forwarder the port gets connected to the right index, and sourcetype.
Can anyone advise me on how to get the key value pairs from these syslog/json events?
Thank you in advance, kind regards,
Willem
Here is the basic approach.
Figure out how to modify your events so that they are VALID JSON. Use this tool to check: https://jsonlint.com/
Once you know how to adjust them, fix them on the way in using SEDCMD-
or other transforms.
DO NOT USE THE _json
SOURCETYPE! Create your own sourcetype and use KV_MODE = json
in props.conf
.
That's it.
Here is the basic approach.
Figure out how to modify your events so that they are VALID JSON. Use this tool to check: https://jsonlint.com/
Once you know how to adjust them, fix them on the way in using SEDCMD-
or other transforms.
DO NOT USE THE _json
SOURCETYPE! Create your own sourcetype and use KV_MODE = json
in props.conf
.
That's it.
Hello,
What do you mean by fix them on the way? Is this possible to do this by using the sourcetype wizard in splunk web? Or do I really need to access props.conf directly? Or is it necessary to have a HF in between to do this?
Event format:
Mon Aug 12 12:29:29 UTC 2019Info: { //json part}\x00
Also, with SEDCMD I can remove the first part with "s/<.{1,40}Info:\s//g"
For the last part I tried: "s/\x00//g" This somehow doesn't work. Do you have any idea why this is not working?
Kind regards,
Willem
Try adding additional \\
characters one by one until it works.
Hello,
Took a while, but this worked for me.
Thank you for your help!
Kind regards,
Willem Jongeneel
Hi @willemjongeneel,
Have you tried using the spath
command ?
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath
You won't need any sed to apply it.
Hello David,
I dont fully understand how to use this spath command. Should I extract the json and use that as the input field? Is this only possible at search time?
Can you maybe explain a little more on how to approach this?
Thanks, kind regards,
Willem
Hi @willemjongeneel,
Yes you can use this command on the search interface. It will allow you to troubleshoot why the KV_MODE =json
isn't giving you any results and you'll know exactly what you need to keep from your raw data to get the extraction working.
Once you identify that you can apply the right sed to reshape your data. You can also use INDEXED_EXTRACTIONS = JSON
instead of KV_MODE = json
for better performance.
Hello,
Thank you.
I got this working using a substring and spath. The full search is:
index= | eval _raw=substr(_raw, 39, (len(_raw)-42)) | spath input=_raw
This cuts off the part before and after the json. Is there a way to get this substring working from props.conf by using Splunk web (as I cannot change it in another way, because I'm using Splunk Cloud).
Kind regards,
Willem
Well you could use the sedcmd you already created to remove the un-wanted subtring on the HF before sending data to Splunk cloud. Include this as well : INDEXED_EXTRACTIONS = JSON
to replace spath
.
Hello David,
We are using universal forwarder, not heavy forwarder. Would this be possible using a universal forwarder?
Kind regards,
Willem
No, just on an HF, or you'll have to put the config on the indexers but you'll have to access props.conf file... so maybe get support to do that for you ?