I Recently came across an issue where the following warning message was spamming my forwarder's splunkd.log for a structured data header extraction configured to use a W3C extended log format message.
10-25-2018 09:47:40.933 +1300 WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field while parsing. This may cause inaccurate field extractions or corrupt/merged events. - data_source="...", data_host="...", data_sourcetype="..."
Dummy W3C format field example:
"2018-10-25" "00:00:00" "[{\"rule_id\":\"37572\",\"type\":\"AD_FORWARD_TO_DC\",\"forward_to_dc_id\":\"447780\"}]"
Forwarder's props.conf:
[mysouretype]
INDEXED_EXTRACTIONS = W3C
FIELD_DELIMITER = whitespace
FIELD_QUOTE = "
FIELD_NAMES = date, time, quoted_field
Raised this with Splunk support as a possible bug, as it appeared to me that the double quote in the log's field were back slashed escaped.
Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"
https://www.w3.org/TR/WD-logfile.html
...
<string> = '"' <schar>* '"'
<schar> = xchar | '"' '"'
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace.
This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.
As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.
The log level can be adjusted to suppress the warning message for the category as below:
cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT
splunk show log-level CsvLineBreaker
Log level will get back to "WARN" after splunk UF has to be restarted.
You can change the log level permanently in the following way in this case.
cd $SPLUNK_HOME/etc
Please add the following line anyware in log.cfg and restart UF.
category.CsvLineBreaker=CRIT
Once UF is restarted after this change, please ensure that the log level has been adjusted.
splunk show log-level CsvLineBreaker
Please refer to the following URL:
https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html
Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"
https://www.w3.org/TR/WD-logfile.html
...
<string> = '"' <schar>* '"'
<schar> = xchar | '"' '"'
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace.
This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.
As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.
The log level can be adjusted to suppress the warning message for the category as below:
cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT
splunk show log-level CsvLineBreaker
Log level will get back to "WARN" after splunk UF has to be restarted.
You can change the log level permanently in the following way in this case.
cd $SPLUNK_HOME/etc
Please add the following line anyware in log.cfg and restart UF.
category.CsvLineBreaker=CRIT
Once UF is restarted after this change, please ensure that the log level has been adjusted.
splunk show log-level CsvLineBreaker
Please refer to the following URL:
https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html