Splunk Search
Highlighted

How to properly parse a CSV file with embedded double quotes on the end of a field before the file is indexed?

New Member

The field ends with a protected quote followed by another quote

Ex:

 "field1",field2", "field3-sdasds\"textdata blah blah\"", "field4-#$%232", 

The embedded quotes are protected, but when the files are processed, it doesn't split the fields correctly and field 3 and 4 end up together.

I have experimented with adding a space between the protected quote and field terminating quote and it seems to work.

field1",field2", "field3-sdasds\"textdata blah blah\" ", "field4-#$%232"

Is there someway to do this automatically before the files are indexed?

0 Karma
Highlighted

Re: How to properly parse a CSV file with embedded double quotes on the end of a field before the file is indexed?

Explorer

You're gonna have to escape the rogue quote.
field1",field2", "field3-sdasds"textdata blah blah\" ", "field4-#$%232"
Any quote that's supposed to be ingested as data rather than a delimiter should be escaped by whatever software is constructing the logs.

0 Karma
Highlighted

Re: How to properly parse a CSV file with embedded double quotes on the end of a field before the file is indexed?

New Member

field3 should look like this "field3-sdasds\"textdata blah blah\"", "field4-#$%232"

0 Karma
Highlighted

Re: How to properly parse a CSV file with embedded double quotes on the end of a field before the file is indexed?

New Member

try this again

field3 should look like this "field3-sdasds\"textdata blah blah\"", "field4-#$%232"

0 Karma
Highlighted

Re: How to properly parse a CSV file with embedded double quotes on the end of a field before the file is indexed?

New Member

ok how do enter backslashes here so they don't get absorbed

0 Karma
Highlighted

Re: How to properly parse a CSV file with embedded double quotes on the end of a field before the file is indexed?

Community Manager
Community Manager

Hi @jhuysing

To get backslashes to render properly, you have to wrap your line of text in back ticks like this so lines like \backslash\backslash\ \ \ will show up as expected. If you're every sharing a .conf stanza, it's best to highlight the entire block and click on the "Code Sample" button in the text editing tools above the text box, especially when showing anything with regular expressions. For example:

[stanza]
REGEX = *\<&>\*
0 Karma