<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CSV comma handling with additional commas in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186615#M37375</link>
    <description>&lt;P&gt;Do you have any control of the generation of the CSV file? In that case, perhaps you can choose a different delimiter, e.g. a pipe, semicolon, # etc, that cannot occurr in your data.&lt;/P&gt;

&lt;P&gt;Then it would be very simple to extract the fields with a REPORT in props.conf and FIELDS/DELIMS in transforms.conf.&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[your_sourcetype]
REPORT-blah = hash_delim
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[hash_delim]
DELIMS = "#"
FIELDS = field1, field2, field3 etc
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;Or, if you can't change the format, or if there's just one known field where the extra commas could occurr, you could set up an EXTRACT in props.conf&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[your_sourcetype]
# commas could be in field 4
EXTRACT-second_round =  ^(?&amp;lt;a1&amp;gt;[^,]*),(?&amp;lt;a2&amp;gt;[^,]*),(?&amp;lt;a3&amp;gt;[^,]*),(?&amp;lt;a4&amp;gt;.*),(?&amp;lt;a5&amp;gt;[^,]*),(?&amp;lt;a6&amp;gt;[^,]*),(?&amp;lt;a7&amp;gt;[^,]*)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But the first option is probably better.&lt;/P&gt;

&lt;P&gt;/K&lt;/P&gt;</description>
    <pubDate>Fri, 20 Dec 2013 12:50:32 GMT</pubDate>
    <dc:creator>kristian_kolb</dc:creator>
    <dc:date>2013-12-20T12:50:32Z</dc:date>
    <item>
      <title>CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186614#M37374</link>
      <description>&lt;P&gt;Hello fellow Splunkers!&lt;/P&gt;

&lt;P&gt;I am having some problems with the format of my data and indexing it correctly in Splunk.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/7awd9ozvg4gymq0/splunk-answers-data-snippet.xlsx"&gt;LINK&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The data is parsed into a CSV files, with default comma delimiters (ignore extension of above file).&lt;/P&gt;

&lt;P&gt;As some of the fields contain plain paragraph text, often with additional commas for punctuation, I cannot tell Splunk to separate fields by comma. Here's an example:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;Field1\, Field2\, Field 3, with Paragraphs\, Field 4&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;IFX has worked with inconsistent results.&lt;/P&gt;

&lt;P&gt;Without writing a very complicated regex, is there a way Splunk can pick-up CSV inserted commas? I am certain once the data is read there is no difference between normal commas and CSV commas, but I'm hoping there may be some other neat trick like this to solve the problem.&lt;/P&gt;

&lt;P&gt;Any ideas?&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;

&lt;P&gt;edit: I have no control over the format of the data and therefore cannot alter delimiter type in file.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 11:40:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186614#M37374</guid>
      <dc:creator>himynamesdave</dc:creator>
      <dc:date>2013-12-20T11:40:37Z</dc:date>
    </item>
    <item>
      <title>Re: CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186615#M37375</link>
      <description>&lt;P&gt;Do you have any control of the generation of the CSV file? In that case, perhaps you can choose a different delimiter, e.g. a pipe, semicolon, # etc, that cannot occurr in your data.&lt;/P&gt;

&lt;P&gt;Then it would be very simple to extract the fields with a REPORT in props.conf and FIELDS/DELIMS in transforms.conf.&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[your_sourcetype]
REPORT-blah = hash_delim
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[hash_delim]
DELIMS = "#"
FIELDS = field1, field2, field3 etc
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;Or, if you can't change the format, or if there's just one known field where the extra commas could occurr, you could set up an EXTRACT in props.conf&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[your_sourcetype]
# commas could be in field 4
EXTRACT-second_round =  ^(?&amp;lt;a1&amp;gt;[^,]*),(?&amp;lt;a2&amp;gt;[^,]*),(?&amp;lt;a3&amp;gt;[^,]*),(?&amp;lt;a4&amp;gt;.*),(?&amp;lt;a5&amp;gt;[^,]*),(?&amp;lt;a6&amp;gt;[^,]*),(?&amp;lt;a7&amp;gt;[^,]*)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But the first option is probably better.&lt;/P&gt;

&lt;P&gt;/K&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 12:50:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186615#M37375</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2013-12-20T12:50:32Z</dc:date>
    </item>
    <item>
      <title>Re: CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186616#M37376</link>
      <description>&lt;P&gt;If you have control over how the CSV files are created, change them to put quotation marks around the fields with embedded commas.  It may be easier to quote all fields.  Once you do that, modify your Splunk transforms to strip the commas during indexing.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 12:55:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186616#M37376</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2013-12-20T12:55:58Z</dc:date>
    </item>
    <item>
      <title>Re: CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186617#M37377</link>
      <description>&lt;P&gt;edit: I have no control over the format of the data and therefore cannot alter delimiter type in file.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 14:03:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186617#M37377</guid>
      <dc:creator>himynamesdave</dc:creator>
      <dc:date>2013-12-20T14:03:02Z</dc:date>
    </item>
    <item>
      <title>Re: CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186618#M37378</link>
      <description>&lt;P&gt;edit: I have no control over the format of the data and therefore cannot alter delimiter type in file.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 14:03:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186618#M37378</guid>
      <dc:creator>himynamesdave</dc:creator>
      <dc:date>2013-12-20T14:03:08Z</dc:date>
    </item>
    <item>
      <title>Re: CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186619#M37379</link>
      <description>&lt;P&gt;So, try the second option. You can do it with &lt;CODE&gt;rex&lt;/CODE&gt; as well;&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;...| rex "^(?&amp;lt;a1&amp;gt;[^,]*),(?&amp;lt;a2&amp;gt;[^,]*),(?&amp;lt;a3&amp;gt;[^,]*),(?&amp;lt;a4&amp;gt;.*),(?&amp;lt;a5&amp;gt;[^,]*),(?&amp;lt;a6&amp;gt;[^,]*),(?&amp;lt;a7&amp;gt;[^,]*)"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 14:19:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186619#M37379</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2013-12-20T14:19:35Z</dc:date>
    </item>
    <item>
      <title>Re: CSV comma handling with additional commas</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186620#M37380</link>
      <description>&lt;P&gt;Regex may be your only answer.  Try something like this:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;(?&amp;lt;Field1&amp;gt;[^,]*?),\s(?&amp;lt;Field2&amp;gt;[^,]*?),\s(?&amp;lt;Field3&amp;gt;.*),\s(?&amp;lt;Field4&amp;gt;[^,]*)&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;That should allow for commas only in Field3.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2013 14:23:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/CSV-comma-handling-with-additional-commas/m-p/186620#M37380</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2013-12-20T14:23:50Z</dc:date>
    </item>
  </channel>
</rss>

