Getting Data In

json gets truncated

gkapitany
Explorer

Valid json gets truncated for some reason. Below is the props.conf file:

TRUNCATE = 0
KV_MODE = json
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^\x7B
LINE_BREAKER = ([\r\n]+)(\x7B)
SHOULD_LINEMERGE = false
DATETIME_CONFIG = CURRENT

Any suggestions?

Tags (2)
0 Karma

woodcock
Esteemed Legend

Everything gets truncated eventually, unless you use the (somewhat dangerous) TRUNCATE = 0 setting. Up your value for TRUNCATE:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf

#******************************************************************************
# Line breaking
#******************************************************************************

# Use the following attributes to define the length of a line.

TRUNCATE = <non-negative integer>
 * Change the default maximum line length (in bytes).
 * Although this is in bytes, line length is rounded down when this would
  otherwise land mid-character for multi-byte characters.
 * Set to 0 if you never want truncation (very long lines are, however, often a sign of
  garbage data).
 * Defaults to 10000 bytes.

You should be getting logs like this:

01-01-2020 18:40:37.625 +0000 WARN LineBreakingProcessor - Truncating line because limit of 10000 has been exceeded

gkapitany
Explorer

Hi,

No , there isn't any log record about truncation due to length. The reason I set TRUNCATE = 0 was to eliminate any potential issue due to length. The intent is to set it to 30000 once I figure out why it gets truncated.

All error messages are like the one below but with different values:
ERROR JsonLineBreaker - JSON StreamId:18294845293918380307 had parsing error:Unexpected character while looking for value: 'r' - da
ta_source="/opt/splunk/vne2splunk/log.json", data_host="splmx1.sample.com", data_sourcetype="_json"

Some logs are parsed correctly like the one below:
{
"audit": "16489",
"hostScore": "0",
"name": "to8pt.sample.com",
"macAddress": "",
"os": "OS Undetermined",
"vulnerabilities": "1",
"netbiosName": "",
"application": {
"": "port - 5040",
"id: 6119 Application: DCE/MS RPC Endpoint Mapper Interface (TCP) description: DCE/MS RPC Endpoint Mapper Interface. parent: 165": "port - 135",
"id: 165 Service: DCE/MS RPC over TCP description: Microsoft RPC (Remote Procedure Call) over TCP is used by many services, including: DHCP Manager, DNS Administration, WINS Manager, Exchange Client/Server, Exchange Administrator and RPC. Third party applications, such as Symantec/Veritas BackupExec, may also make use of it. protocol: tcp transport: n/a parentid: n/a": "port - 135",
"id: 8037 Service: IPv4 Layer 4 description: Generic Layer 3 / Layer 4 RAW socket access. protocol: ip transport: n/a parentid: n/a": "port - 0"
},
"timeStamp": "2020-01-02 00:03:56",
"ipAddress": "172.16.25.32",
"id": "4128157",
"network": "INT - Transports"
}

The only difference is that the "application " object varies in length. One example I have is in Splunk gets truncated at 14,532 character, but the original json has 15,071 characters.

This leads me to believe that the issue is related to some character sequence but not sure which one.

0 Karma

rvaglid
Observer

Could you try to do a | eval eventlenght = len(_raw) to see if Splunk truncates at the same position every time?

0 Karma

gkapitany
Explorer

Hi rvaglid,

I ran the suggested eval on a few entries and the truncation position is not consistent:
11170
12231
13721
11331

Like I mentioned above it doesn't appear that the truncation occurs due to length but rather a character sequence.

0 Karma

to4kawa
SplunkTrust
SplunkTrust

json-file-getting-truncated

how about this?

0 Karma

gkapitany
Explorer

I've tried also a few alternate line breakers with no success:

LINE_BREAKER = ([\r\n]+)(\x7B(\x22))audit
LINE_BREAKER = ([\r\n]+)(\x7B)audit
LINE_BREAKER = ([\r\n]*)(?={)
no line breaker and INDEXED_EXTRACTIONS = json

Below is the beginning of the json (truncated here to keep the post clean)
{"audit":"16463","hostScore":"0","name":"to8pt.sample.com","macAddress":"","os":"OS Undetermined",

0 Karma

to4kawa
SplunkTrust
SplunkTrust

LINE_BREAKER = ([\r\n]*)(?=\{)
How about this?
you should do escape the character "{"

0 Karma

gkapitany
Explorer

It didn't work either.

0 Karma

to4kawa
SplunkTrust
SplunkTrust
TRUNCATE = 0
KV_MODE = json
NO_BINARY_CHECK = true
LINE_BREAKER = ([\r\n]+)(?=\{)
SHOULD_LINEMERGE = false
DATETIME_CONFIG = CURRENT

How many logs are actually there and how many are trancated?
Also, is it LINE_BREAKER that doesn't work?
I think it's different from your question.

JSON props.conf

0 Karma