<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I edit my props.conf for proper line breaking when indexing a CSV file with a large amount of quotes and newlines? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139261#M28646</link>
    <description>&lt;P&gt;I have a csv file that's giving me a headache while trying to index it.&lt;BR /&gt;
It has 100+ columns, several of which are making life difficult by containing large amounts of things like quotes and newlines.  &lt;/P&gt;

&lt;H2&gt;A sanitised example showing the header line and a problem event:&lt;/H2&gt;

&lt;PRE&gt;&lt;CODE&gt;field1,field2,field3,field4,field5,field6
"55634","Barney","","this field behaves well","","1436504081000"
"","Fred","","Here, have some data

that will make your life very difficult

""should"" you try to parse this puppy","F6E25B","1435307738000"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;(The quotes around should are intentional, there's sections of the data that look exactly like that)&lt;/P&gt;

&lt;P&gt;I've tried using the following props, to no avail - Barney does the right thing, but Fred's line breaking goes wrong.  Can someone point out where I'm going wrong?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;BREAK_ONLY_AFTER=\"$                                        
HEADER_FIELD_LINE_NUMBER=1
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
TIMESTAMP_FIELDS=field6
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This file is created completely new at a regular interval - it's a scheduled database dump.  I want to index the entirety each time.&lt;BR /&gt;
I want to keep the inputs.conf as simple as possible, only defining host, sourcetype and destination index.  A parsing app on the indexer will have the props.conf.&lt;/P&gt;

&lt;P&gt;Thanks in advance for any help.&lt;/P&gt;</description>
    <pubDate>Mon, 27 Jul 2015 22:18:00 GMT</pubDate>
    <dc:creator>wardallen</dc:creator>
    <dc:date>2015-07-27T22:18:00Z</dc:date>
    <item>
      <title>How do I edit my props.conf for proper line breaking when indexing a CSV file with a large amount of quotes and newlines?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139261#M28646</link>
      <description>&lt;P&gt;I have a csv file that's giving me a headache while trying to index it.&lt;BR /&gt;
It has 100+ columns, several of which are making life difficult by containing large amounts of things like quotes and newlines.  &lt;/P&gt;

&lt;H2&gt;A sanitised example showing the header line and a problem event:&lt;/H2&gt;

&lt;PRE&gt;&lt;CODE&gt;field1,field2,field3,field4,field5,field6
"55634","Barney","","this field behaves well","","1436504081000"
"","Fred","","Here, have some data

that will make your life very difficult

""should"" you try to parse this puppy","F6E25B","1435307738000"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;HR /&gt;

&lt;P&gt;(The quotes around should are intentional, there's sections of the data that look exactly like that)&lt;/P&gt;

&lt;P&gt;I've tried using the following props, to no avail - Barney does the right thing, but Fred's line breaking goes wrong.  Can someone point out where I'm going wrong?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;BREAK_ONLY_AFTER=\"$                                        
HEADER_FIELD_LINE_NUMBER=1
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
TIMESTAMP_FIELDS=field6
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This file is created completely new at a regular interval - it's a scheduled database dump.  I want to index the entirety each time.&lt;BR /&gt;
I want to keep the inputs.conf as simple as possible, only defining host, sourcetype and destination index.  A parsing app on the indexer will have the props.conf.&lt;/P&gt;

&lt;P&gt;Thanks in advance for any help.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jul 2015 22:18:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139261#M28646</guid>
      <dc:creator>wardallen</dc:creator>
      <dc:date>2015-07-27T22:18:00Z</dc:date>
    </item>
    <item>
      <title>Re: How do I edit my props.conf for proper line breaking when indexing a CSV file with a large amount of quotes and newlines?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139262#M28647</link>
      <description>&lt;P&gt;Ok, to me the only thing that seems to be consistent in your two examples is every event ends with time (epoch).  I would do is use &lt;STRONG&gt;MUST_BREAK_AFTER&lt;/STRONG&gt; instead of BREAK_ONLY_AFTER.&lt;/P&gt;

&lt;P&gt;[yourSourcetype]&lt;BR /&gt;
MUST_BREAK_AFTER=,"\d{13}"&lt;BR /&gt;
HEADER_FIELD_LINE_NUMBER=1&lt;BR /&gt;
NO_BINARY_CHECK=true&lt;BR /&gt;
SHOULD_LINEMERGE=true&lt;BR /&gt;
TIMESTAMP_FIELDS=field6&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 06:49:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139262#M28647</guid>
      <dc:creator>bmacias84</dc:creator>
      <dc:date>2020-09-29T06:49:04Z</dc:date>
    </item>
    <item>
      <title>Re: How do I edit my props.conf for proper line breaking when indexing a CSV file with a large amount of quotes and newlines?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139263#M28648</link>
      <description>&lt;P&gt;Sorry, I think I've given you the wrong idea with my fictional data.  The actual data's last column may or may not have a value in it.  I'll edit my example data when I can.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jul 2015 22:40:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-I-edit-my-props-conf-for-proper-line-breaking-when/m-p/139263#M28648</guid>
      <dc:creator>wardallen</dc:creator>
      <dc:date>2015-07-27T22:40:39Z</dc:date>
    </item>
  </channel>
</rss>

