Deployment Architecture

Structured data header extraction - WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field

gcato
Contributor

I Recently came across an issue where the following warning message was spamming my forwarder's splunkd.log for a structured data header extraction configured to use a W3C extended log format message.

10-25-2018 09:47:40.933 +1300 WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field while parsing. This may cause inaccurate field extractions or corrupt/merged events. - data_source="...", data_host="...", data_sourcetype="..."

Dummy W3C format field example:
"2018-10-25" "00:00:00" "[{\"rule_id\":\"37572\",\"type\":\"AD_FORWARD_TO_DC\",\"forward_to_dc_id\":\"447780\"}]"

Forwarder's props.conf:
[mysouretype]
INDEXED_EXTRACTIONS = W3C
FIELD_DELIMITER = whitespace
FIELD_QUOTE = "
FIELD_NAMES = date, time, quoted_field

Raised this with Splunk support as a possible bug, as it appeared to me that the double quote in the log's field were back slashed escaped.

Tags (1)
0 Karma
1 Solution

gcato
Contributor

Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"

https://www.w3.org/TR/WD-logfile.html
...

<string> = '"' <schar>* '"' 

<schar> = xchar | '"' '"' 
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace. 

This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
 "aaa","b""bb","ccc"

Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.

As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.

The log level can be adjusted to suppress the warning message for the category as below:

cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT 
splunk show log-level CsvLineBreaker

Log level will get back to "WARN" after splunk UF has to be restarted.

You can change the log level permanently in the following way in this case.

cd $SPLUNK_HOME/etc

Please add the following line anyware in log.cfg and restart UF.

category.CsvLineBreaker=CRIT

Once UF is restarted after this change, please ensure that the log level has been adjusted.

splunk show log-level CsvLineBreaker

Please refer to the following URL:

https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html

View solution in original post

gcato
Contributor

Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"

https://www.w3.org/TR/WD-logfile.html
...

<string> = '"' <schar>* '"' 

<schar> = xchar | '"' '"' 
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace. 

This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
 "aaa","b""bb","ccc"

Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.

As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.

The log level can be adjusted to suppress the warning message for the category as below:

cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT 
splunk show log-level CsvLineBreaker

Log level will get back to "WARN" after splunk UF has to be restarted.

You can change the log level permanently in the following way in this case.

cd $SPLUNK_HOME/etc

Please add the following line anyware in log.cfg and restart UF.

category.CsvLineBreaker=CRIT

Once UF is restarted after this change, please ensure that the log level has been adjusted.

splunk show log-level CsvLineBreaker

Please refer to the following URL:

https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...