Deployment Architecture

Structured data header extraction - WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field

gcato
Contributor

I Recently came across an issue where the following warning message was spamming my forwarder's splunkd.log for a structured data header extraction configured to use a W3C extended log format message.

10-25-2018 09:47:40.933 +1300 WARN CsvLineBreaker - Parser warning: Encountered unescaped quotation mark in field while parsing. This may cause inaccurate field extractions or corrupt/merged events. - data_source="...", data_host="...", data_sourcetype="..."

Dummy W3C format field example:
"2018-10-25" "00:00:00" "[{\"rule_id\":\"37572\",\"type\":\"AD_FORWARD_TO_DC\",\"forward_to_dc_id\":\"447780\"}]"

Forwarder's props.conf:
[mysouretype]
INDEXED_EXTRACTIONS = W3C
FIELD_DELIMITER = whitespace
FIELD_QUOTE = "
FIELD_NAMES = date, time, quoted_field

Raised this with Splunk support as a possible bug, as it appeared to me that the double quote in the log's field were back slashed escaped.

Tags (1)
0 Karma
1 Solution

gcato
Contributor

Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"

https://www.w3.org/TR/WD-logfile.html
...

<string> = '"' <schar>* '"' 

<schar> = xchar | '"' '"' 
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace. 

This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
 "aaa","b""bb","ccc"

Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.

As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.

The log level can be adjusted to suppress the warning message for the category as below:

cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT 
splunk show log-level CsvLineBreaker

Log level will get back to "WARN" after splunk UF has to be restarted.

You can change the log level permanently in the following way in this case.

cd $SPLUNK_HOME/etc

Please add the following line anyware in log.cfg and restart UF.

category.CsvLineBreaker=CRIT

Once UF is restarted after this change, please ensure that the log level has been adjusted.

splunk show log-level CsvLineBreaker

Please refer to the following URL:

https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html

View solution in original post

gcato
Contributor

Splunk support responded to my issue and it is not a bug, as quote characters in W3C extended log file format are escaped with a repeated quote, i.e "", not \"

https://www.w3.org/TR/WD-logfile.html
...

<string> = '"' <schar>* '"' 

<schar> = xchar | '"' '"' 
Strings are output in quoted form. If a string contains a quotation character the character is repeated. This format is unambiguous since fields are by definition separated by whitespace. 

This is similar to the CSV specification: https://tools.ietf.org/html/rfc4180

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
 "aaa","b""bb","ccc"

Interestingly, in JSON format it would be \", so in my case, to maintain the backslash, the W3C formatted field should be \"". This tested okay when I changed the field in my test log.

As I cannot change the incoming log file - and I do not want to preprocess the file - Splunk support suggested the following method to suppress the CsvLineBreaker warning.

The log level can be adjusted to suppress the warning message for the category as below:

cd $SPLUNK_HOME/bin
splunk show log-level CsvLineBreaker
splunk set log-level CsvLineBreaker -level CRIT 
splunk show log-level CsvLineBreaker

Log level will get back to "WARN" after splunk UF has to be restarted.

You can change the log level permanently in the following way in this case.

cd $SPLUNK_HOME/etc

Please add the following line anyware in log.cfg and restart UF.

category.CsvLineBreaker=CRIT

Once UF is restarted after this change, please ensure that the log level has been adjusted.

splunk show log-level CsvLineBreaker

Please refer to the following URL:

https://www.splunk.com/blog/2008/09/22/enabling-debug-messages.html

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...