Deployment Architecture

Structured data header extraction - WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field

gcato
Contributor

I Recently came across an issue where the following warning message was spamming my forwarder's splunkd.log for a structured data header extraction configured to use a W3C extended log format message.

10-25-2018 09:47:40.933 +1300 WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field while parsing. This may cause inaccurate field extractions or corrupt/merged events. - data_source="...", data_host="...", data_sourcetype="..."

Dummy W3C format field example:
"2018-10-25" "00:00:00" "[{\"rule_id\":\"37572\",\"type\":\"AD_FORWARD_TO_DC\",\"forward_to_dc_id\":\"447780\"}]"

Forwarder's props.conf:
[mysouretype]
INDEXED_EXTRACTIONS = W3C
FIELD_DELIMITER = whitespace
FIELD_QUOTE = "
FIELD_NAMES = date, time, quoted_field

Raised this with Splunk support as a possible bug, as it appeared to me that the double quote in the log's field were back slashed escaped.

Tags (1)
0 Karma
1 Solution

gcato
Contributor

Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"

https://www.w3.org/TR/WD-logfile.html
...

<string> = '"' <schar>* '"' 

<schar> = xchar | '"' '"' 
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace. 

This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
 "aaa","b""bb","ccc"

Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.

As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.

The log level can be adjusted to suppress the warning message for the category as below:

cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT 
splunk show log-level CsvLineBreaker

Log level will get back to "WARN" after splunk UF has to be restarted.

You can change the log level permanently in the following way in this case.

cd $SPLUNK_HOME/etc

Please add the following line anyware in log.cfg and restart UF.

category.CsvLineBreaker=CRIT

Once UF is restarted after this change, please ensure that the log level has been adjusted.

splunk show log-level CsvLineBreaker

Please refer to the following URL:

https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html

View solution in original post

gcato
Contributor

Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"

https://www.w3.org/TR/WD-logfile.html
...

<string> = '"' <schar>* '"' 

<schar> = xchar | '"' '"' 
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace. 

This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
 "aaa","b""bb","ccc"

Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.

As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.

The log level can be adjusted to suppress the warning message for the category as below:

cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT 
splunk show log-level CsvLineBreaker

Log level will get back to "WARN" after splunk UF has to be restarted.

You can change the log level permanently in the following way in this case.

cd $SPLUNK_HOME/etc

Please add the following line anyware in log.cfg and restart UF.

category.CsvLineBreaker=CRIT

Once UF is restarted after this change, please ensure that the log level has been adjusted.

splunk show log-level CsvLineBreaker

Please refer to the following URL:

https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...