Hi,
I'm getting errors with parsing of json files in the universal forwarder.
I'm generating json outputs - a new file is generated every time a run a routine. Output has the below:
[
{
"datetime":"2017-10-25 14:33:16+01:00",
"user":"",
"category":"ST",
"type":"ABC",
"frontend":"3.0",
"backend":"",
"r_version":"",
"b_version":"",
"status":"R",
"next_planned_r_version":"",
"next_planned_b_version":"",
"comment":""
}
]
Splunk forwarder gives me the following log entries in splunkd.log:
10-25-2017 14:33:16.273 +0100 ERROR JsonLineBreaker - JSON StreamId:16742053991537090041 had parsing error:Unexpected character: ':' - data_source="/root/status-update/environment_health_status_50.json", data_host="hostxyz", data_sourcetype="_json"
The line above repeats about the same number of lines with ":" in the output. Then lines below:
10-25-2017 14:33:16.273 +0100 ERROR JsonLineBreaker - JSON StreamId:16742053991537090041 had parsing error:Unexpected character: '}' - data_source="/root/status-update/environment_health_status_50.json", data_host="hostxyz", data_sourcetype="_json"
10-25-2017 14:33:16.273 +0100 ERROR JsonLineBreaker - JSON StreamId:16742053991537090041 had parsing error:Unexpected character: ']' - data_source="/root/status-update/environment_health_status_50.json", data_host="hostxyz", data_sourcetype="_json"
I've tried universal forwarders versions 7.0 and 6.5.3.
I've been trying to isolated the root cause but had no luck with that - even without changing anything. Sometimes it goes fine, but mostly it doesn't. If I stop splunk, erase fishbucket and start it again, it will ingest all files just fine. However, when I run my test afterwards that is creating new files, it will fail. (or not, as I explained).
monitor in inputs.conf:
[monitor:///root/status-update/environment_health_status_*.json]
index=dev_test
sourcetype=_json
_json stanza on the forwarder by using btool:
PS: I haven't made any config in props.conf, only inputs.
[_json]
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
HEADER_MODE =
INDEXED_EXTRACTIONS = json
KV_MODE = none
LEARN_MODEL = true
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MATCH_LIMIT = 100000
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = True
TRANSFORMS =
TRUNCATE = 10000
category = Structured
description = JavaScript Object Notation format. For more information, visit http://json.org/
detect_trailing_nulls = false
maxDist = 100
priority =
pulldown_type = true
sourcetype =
I finally found what was wrong. The output was being generated like this:
echo '[' > $OUTPUT_FILENAME
echo ' ' >> $OUTPUT_FILENAME
echo ' "datetime":"'$(date --rfc-3339=seconds)'",' >> $OUTPUT_FILENAME
echo ' "user": "'$username'",' >> $OUTPUT_FILENAME
echo ' "environment_category": "'$environment_category'",' >> $OUTPUT_FILENAME
echo ' "release_type": "'$release_type'",' >> $OUTPUT_FILENAME
echo ' "environment_frontend": "'$environment_frontend'",' >> $OUTPUT_FILENAME
echo ' "environment_backend": "'$environment_backend'",' >> $OUTPUT_FILENAME
echo ' "release_version": "'$release_version'",' >> $OUTPUT_FILENAME
echo ' "branch_version": "'$branch_version'",' >> $OUTPUT_FILENAME
echo ' "status": "'$status'",' >> $OUTPUT_FILENAME
echo ' "next_planned_release_version": "'$next_planned_release_version'",' >> $OUTPUT_FILENAME
echo ' "next_planned_branch_version": "'$next_planned_branch_version'",' >> $OUTPUT_FILENAME
echo ' "comment": "'$comment'"' >> $OUTPUT_FILENAME
echo ' ' >> $OUTPUT_FILENAME
echo ']' >> $OUTPUT_FILENAME
Replaced with
echo ' "datetime":"'$(date --rfc-3339=seconds)'", "user":"'$username'", "environment_category":"'$environment_category'", "release_type":"'$release_type'", "environment_frontend": "'$environment_frontend'", "environment_backend": "'$environment_backend'", "release_version": "'$release_version'", "branch_version": "'$branch_version'", "status": "'$status'", "next_planned_release_version": "'$next_planned_release_version'", "next_planned_branch_version": "'$next_planned_branch_version'", "comment": "'$comment'"' >> $OUTPUT_FILENAME
Not looking good for humans now, but apparently Splunk didn't like the line breaking (possibly didn't care about square brackets )
Now, why json files were indexed fine after restarting Splunk but not the following files during runtime, the question remains.
I finally found what was wrong. The output was being generated like this:
echo '[' > $OUTPUT_FILENAME
echo ' ' >> $OUTPUT_FILENAME
echo ' "datetime":"'$(date --rfc-3339=seconds)'",' >> $OUTPUT_FILENAME
echo ' "user": "'$username'",' >> $OUTPUT_FILENAME
echo ' "environment_category": "'$environment_category'",' >> $OUTPUT_FILENAME
echo ' "release_type": "'$release_type'",' >> $OUTPUT_FILENAME
echo ' "environment_frontend": "'$environment_frontend'",' >> $OUTPUT_FILENAME
echo ' "environment_backend": "'$environment_backend'",' >> $OUTPUT_FILENAME
echo ' "release_version": "'$release_version'",' >> $OUTPUT_FILENAME
echo ' "branch_version": "'$branch_version'",' >> $OUTPUT_FILENAME
echo ' "status": "'$status'",' >> $OUTPUT_FILENAME
echo ' "next_planned_release_version": "'$next_planned_release_version'",' >> $OUTPUT_FILENAME
echo ' "next_planned_branch_version": "'$next_planned_branch_version'",' >> $OUTPUT_FILENAME
echo ' "comment": "'$comment'"' >> $OUTPUT_FILENAME
echo ' ' >> $OUTPUT_FILENAME
echo ']' >> $OUTPUT_FILENAME
Replaced with
echo ' "datetime":"'$(date --rfc-3339=seconds)'", "user":"'$username'", "environment_category":"'$environment_category'", "release_type":"'$release_type'", "environment_frontend": "'$environment_frontend'", "environment_backend": "'$environment_backend'", "release_version": "'$release_version'", "branch_version": "'$branch_version'", "status": "'$status'", "next_planned_release_version": "'$next_planned_release_version'", "next_planned_branch_version": "'$next_planned_branch_version'", "comment": "'$comment'"' >> $OUTPUT_FILENAME
Not looking good for humans now, but apparently Splunk didn't like the line breaking (possibly didn't care about square brackets )
Now, why json files were indexed fine after restarting Splunk but not the following files during runtime, the question remains.
@vegerlandecs Hi, I have a usecase just opposite of you.
My use case is:
I am using splunk universal forwarder to forward logs. And I am able to send the logs to Splunk. I would like to parse the logs by breaking them into multiple lines as below
Now I am getting my log as
{ [-]
log: {someinformation of appication here {msg"a":"1","b":"2","c":"3","d":"4"
}
I want my log to be appear as
so i want to extract the field so that it should appear as below in the splunk ui
{ [-]
log: {someinformation of appication here {msg-"a":"1","b":"2","c":"3","d":"4"}
}
msg-{
a:1
b:2
c:3
d:4
}
I am adding below lines in props.conf
[Sourcetype]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
SEDCMD-1_unjsonify = s/{"log":"(?:\u[0-9]+)?(.?)\n","stream./\1/g
SEDCMD-2_unescapequotes = s/\"/"/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC
Can we do on forwarder side?.
Any help is appreciated.
Thanks.
@vj5 SEDCMD is the kind of option that is not processed by universal forwarders. ref: https://wiki.splunk.com/Community:HowIndexingWorks, very last image shows as it's part of the typing processor - which only enterprise installations (HF and IDX) will have.
Also, Splunk uses PCRE notation, so \u is not supported.
From the snippets it isn't very clear to me what are you trying to SED, but consider this replacement regex as a start:
SEDCMD-1_unjsonify = s/log:\s+?{.*?{(.*?)}/\1/g