Getting Data In

JSON parsing error in the universal forwarder

vegerlandecs
Explorer

Hi,

I'm getting errors with parsing of json files in the universal forwarder.
I'm generating json outputs - a new file is generated every time a run a routine. Output has the below:

[
    {
    "datetime":"2017-10-25 14:33:16+01:00",
    "user":"",
    "category":"ST",
    "type":"ABC",
    "frontend":"3.0",
    "backend":"",
    "r_version":"",
    "b_version":"",
    "status":"R",
    "next_planned_r_version":"",
    "next_planned_b_version":"",
    "comment":""
  }
]

Splunk forwarder gives me the following log entries in splunkd.log:

10-25-2017 14:33:16.273 +0100 ERROR JsonLineBreaker - JSON StreamId:16742053991537090041 had parsing error:Unexpected character: ':' - data_source="/root/status-update/environment_health_status_50.json", data_host="hostxyz", data_sourcetype="_json"

The line above repeats about the same number of lines with ":" in the output. Then lines below:

10-25-2017 14:33:16.273 +0100 ERROR JsonLineBreaker - JSON StreamId:16742053991537090041 had parsing error:Unexpected character: '}' - data_source="/root/status-update/environment_health_status_50.json", data_host="hostxyz", data_sourcetype="_json"
10-25-2017 14:33:16.273 +0100 ERROR JsonLineBreaker - JSON StreamId:16742053991537090041 had parsing error:Unexpected character: ']' - data_source="/root/status-update/environment_health_status_50.json", data_host="hostxyz", data_sourcetype="_json"

I've tried universal forwarders versions 7.0 and 6.5.3.

I've been trying to isolated the root cause but had no luck with that - even without changing anything. Sometimes it goes fine, but mostly it doesn't. If I stop splunk, erase fishbucket and start it again, it will ingest all files just fine. However, when I run my test afterwards that is creating new files, it will fail. (or not, as I explained).

monitor in inputs.conf:

[monitor:///root/status-update/environment_health_status_*.json]
index=dev_test
sourcetype=_json

_json stanza on the forwarder by using btool:
PS: I haven't made any config in props.conf, only inputs.

   [_json]
    ANNOTATE_PUNCT = True
    AUTO_KV_JSON = true
    BREAK_ONLY_BEFORE =
    BREAK_ONLY_BEFORE_DATE = True
    CHARSET = UTF-8
    DATETIME_CONFIG = /etc/datetime.xml
    HEADER_MODE =
    INDEXED_EXTRACTIONS = json
    KV_MODE = none
    LEARN_MODEL = true
    LEARN_SOURCETYPE = true
    LINE_BREAKER_LOOKBEHIND = 100
    MATCH_LIMIT = 100000
    MAX_DAYS_AGO = 2000
    MAX_DAYS_HENCE = 2
    MAX_DIFF_SECS_AGO = 3600
    MAX_DIFF_SECS_HENCE = 604800
    MAX_EVENTS = 256
    MAX_TIMESTAMP_LOOKAHEAD = 128
    MUST_BREAK_AFTER =
    MUST_NOT_BREAK_AFTER =
    MUST_NOT_BREAK_BEFORE =
    SEGMENTATION = indexing
    SEGMENTATION-all = full
    SEGMENTATION-inner = inner
    SEGMENTATION-outer = outer
    SEGMENTATION-raw = none
    SEGMENTATION-standard = standard
    SHOULD_LINEMERGE = True
    TRANSFORMS =
    TRUNCATE = 10000
    category = Structured
    description = JavaScript Object Notation format. For more information, visit http://json.org/
    detect_trailing_nulls = false
    maxDist = 100
    priority =
    pulldown_type = true
    sourcetype =
1 Solution

vegerlandecs
Explorer

I finally found what was wrong. The output was being generated like this:

echo '[' > $OUTPUT_FILENAME
echo '  ' >> $OUTPUT_FILENAME
echo '    "datetime":"'$(date --rfc-3339=seconds)'",' >> $OUTPUT_FILENAME
echo '    "user": "'$username'",' >> $OUTPUT_FILENAME
echo '    "environment_category": "'$environment_category'",' >> $OUTPUT_FILENAME
echo '    "release_type": "'$release_type'",' >> $OUTPUT_FILENAME
echo '    "environment_frontend": "'$environment_frontend'",' >> $OUTPUT_FILENAME
echo '    "environment_backend": "'$environment_backend'",' >> $OUTPUT_FILENAME
echo '    "release_version": "'$release_version'",' >> $OUTPUT_FILENAME
echo '    "branch_version": "'$branch_version'",' >> $OUTPUT_FILENAME
echo '    "status": "'$status'",' >> $OUTPUT_FILENAME
echo '    "next_planned_release_version": "'$next_planned_release_version'",' >> $OUTPUT_FILENAME
echo '    "next_planned_branch_version": "'$next_planned_branch_version'",' >> $OUTPUT_FILENAME
echo '    "comment": "'$comment'"' >> $OUTPUT_FILENAME
echo '  ' >> $OUTPUT_FILENAME
echo ']' >> $OUTPUT_FILENAME

Replaced with

echo ' "datetime":"'$(date --rfc-3339=seconds)'", "user":"'$username'", "environment_category":"'$environment_category'", "release_type":"'$release_type'", "environment_frontend": "'$environment_frontend'", "environment_backend": "'$environment_backend'", "release_version": "'$release_version'", "branch_version": "'$branch_version'", "status": "'$status'", "next_planned_release_version": "'$next_planned_release_version'", "next_planned_branch_version": "'$next_planned_branch_version'", "comment": "'$comment'"' >> $OUTPUT_FILENAME

Not looking good for humans now, but apparently Splunk didn't like the line breaking (possibly didn't care about square brackets )
Now, why json files were indexed fine after restarting Splunk but not the following files during runtime, the question remains.

View solution in original post

0 Karma

vegerlandecs
Explorer

I finally found what was wrong. The output was being generated like this:

echo '[' > $OUTPUT_FILENAME
echo '  ' >> $OUTPUT_FILENAME
echo '    "datetime":"'$(date --rfc-3339=seconds)'",' >> $OUTPUT_FILENAME
echo '    "user": "'$username'",' >> $OUTPUT_FILENAME
echo '    "environment_category": "'$environment_category'",' >> $OUTPUT_FILENAME
echo '    "release_type": "'$release_type'",' >> $OUTPUT_FILENAME
echo '    "environment_frontend": "'$environment_frontend'",' >> $OUTPUT_FILENAME
echo '    "environment_backend": "'$environment_backend'",' >> $OUTPUT_FILENAME
echo '    "release_version": "'$release_version'",' >> $OUTPUT_FILENAME
echo '    "branch_version": "'$branch_version'",' >> $OUTPUT_FILENAME
echo '    "status": "'$status'",' >> $OUTPUT_FILENAME
echo '    "next_planned_release_version": "'$next_planned_release_version'",' >> $OUTPUT_FILENAME
echo '    "next_planned_branch_version": "'$next_planned_branch_version'",' >> $OUTPUT_FILENAME
echo '    "comment": "'$comment'"' >> $OUTPUT_FILENAME
echo '  ' >> $OUTPUT_FILENAME
echo ']' >> $OUTPUT_FILENAME

Replaced with

echo ' "datetime":"'$(date --rfc-3339=seconds)'", "user":"'$username'", "environment_category":"'$environment_category'", "release_type":"'$release_type'", "environment_frontend": "'$environment_frontend'", "environment_backend": "'$environment_backend'", "release_version": "'$release_version'", "branch_version": "'$branch_version'", "status": "'$status'", "next_planned_release_version": "'$next_planned_release_version'", "next_planned_branch_version": "'$next_planned_branch_version'", "comment": "'$comment'"' >> $OUTPUT_FILENAME

Not looking good for humans now, but apparently Splunk didn't like the line breaking (possibly didn't care about square brackets )
Now, why json files were indexed fine after restarting Splunk but not the following files during runtime, the question remains.

0 Karma

vj5
New Member

@vegerlandecs Hi, I have a usecase just opposite of you.
My use case is:

I am using splunk universal forwarder to forward logs. And I am able to send the logs to Splunk. I would like to parse the logs by breaking them into multiple lines as below

Now I am getting my log as
{ [-]
log: {someinformation of appication here {msg"a":"1","b":"2","c":"3","d":"4"
}

I want my log to be appear as
so i want to extract the field so that it should appear as below in the splunk ui

{ [-]
log: {someinformation of appication here {msg-"a":"1","b":"2","c":"3","d":"4"}
}
msg-{
a:1
b:2
c:3
d:4
}

I am adding below lines in props.conf

[Sourcetype]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true

remove docker json wrapper, then remove escapes from the quotes in the log message.

SEDCMD-1_unjsonify = s/{"log":"(?:\u[0-9]+)?(.?)\n","stream./\1/g
SEDCMD-2_unescapequotes = s/\"/"/g

another exprimental version of the sed.

SEDCMD-1_unjsonify = s/{"log":"(?:\u[0-9]+)?(.)\n","stream.?([\n\r])/\1\2/g

category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC

Can we do on forwarder side?.
Any help is appreciated.

Thanks.

0 Karma

vegerlandecs
Explorer

@vj5  SEDCMD is the kind of option that is not processed by universal forwarders. ref: https://wiki.splunk.com/Community:HowIndexingWorks, very last image shows as it's part of the typing processor - which only enterprise installations (HF and IDX) will have.

Also, Splunk uses PCRE notation, so \u is not supported.

 

From the snippets it isn't very clear to me what are you trying to SED, but consider this replacement regex as a start: 

SEDCMD-1_unjsonify = s/log:\s+?{.*?{(.*?)}/\1/g
0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...