<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why is some data being indexed as separate entries while monitoring csv log files? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-some-data-being-indexed-as-separate-entries-while/m-p/170489#M34469</link>
    <description>&lt;P&gt;I don't think this is a bug. I would avoid locking the file in general, as it may have unexpected performance impacts.&lt;/P&gt;

&lt;P&gt;What are the settings in the &lt;CODE&gt;inputs.conf&lt;/CODE&gt; stanza that is monitoring the output folder? I would suggest this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor://yourdirectorypathhere]
index = theindexname
sourcetype = csv
ignoreOlderThan = 30d
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Or, you might find a pretrained sourcetype that better fits your data here:  &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Listofpretrainedsourcetypes"&gt;List of pretrained sourcetypes&lt;/A&gt;&lt;BR /&gt;
If you are not cleaning out the older files (which you should), the &lt;CODE&gt;ignoreOlderThan&lt;/CODE&gt; will help Splunk's performance if the directory becomes full of older files that have already been indexed and that will never be updated.&lt;/P&gt;

&lt;P&gt;If you don't want to use the &lt;CODE&gt;csv&lt;/CODE&gt; sourcetype, you may need to place a &lt;CODE&gt;props.conf&lt;/CODE&gt; file on your indexer that explicitly sets the parsing rules for the sourcetype that you choose.&lt;/P&gt;</description>
    <pubDate>Mon, 05 Jan 2015 00:11:14 GMT</pubDate>
    <dc:creator>lguinn2</dc:creator>
    <dc:date>2015-01-05T00:11:14Z</dc:date>
    <item>
      <title>Why is some data being indexed as separate entries while monitoring csv log files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-some-data-being-indexed-as-separate-entries-while/m-p/170488#M34468</link>
      <description>&lt;P&gt;I have a console app which reads data from table storage, and writes it out onto a csv file. I monitor each of the output folder, I checked to see if the data is being uploaded properly which it does. However there is disconnected data inside the index when I look at all the records. Reason I know is because the records in the csv file doesn't match the records on splunk.&lt;/P&gt;

&lt;P&gt;An entry contains 18 fields some of the entries are being split in the middle of the entry. &lt;BR /&gt;
Example Entry&lt;BR /&gt;
12/15/2015, Name, ID, Number, Guid ....&lt;/P&gt;

&lt;P&gt;Splunk logs it as 2 seperate entry&lt;/P&gt;

&lt;P&gt;12/15/2015,Name,ID,&lt;BR /&gt;
Number,Guid ...&lt;/P&gt;

&lt;P&gt;The console app runs every day once at midnight and not all entries are being malformed just a few of them. The strange thing is that the data is still all there it just some of them are separated. Anyone know of a work around or is this a bug? &lt;BR /&gt;
I was thinking of locking the file until it's finish writing but I'm not sure how splunk would react to fileshare locking. &lt;/P&gt;</description>
      <pubDate>Mon, 29 Dec 2014 22:20:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-some-data-being-indexed-as-separate-entries-while/m-p/170488#M34468</guid>
      <dc:creator>chatterjb</dc:creator>
      <dc:date>2014-12-29T22:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: Why is some data being indexed as separate entries while monitoring csv log files?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-some-data-being-indexed-as-separate-entries-while/m-p/170489#M34469</link>
      <description>&lt;P&gt;I don't think this is a bug. I would avoid locking the file in general, as it may have unexpected performance impacts.&lt;/P&gt;

&lt;P&gt;What are the settings in the &lt;CODE&gt;inputs.conf&lt;/CODE&gt; stanza that is monitoring the output folder? I would suggest this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor://yourdirectorypathhere]
index = theindexname
sourcetype = csv
ignoreOlderThan = 30d
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Or, you might find a pretrained sourcetype that better fits your data here:  &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.2.1/Data/Listofpretrainedsourcetypes"&gt;List of pretrained sourcetypes&lt;/A&gt;&lt;BR /&gt;
If you are not cleaning out the older files (which you should), the &lt;CODE&gt;ignoreOlderThan&lt;/CODE&gt; will help Splunk's performance if the directory becomes full of older files that have already been indexed and that will never be updated.&lt;/P&gt;

&lt;P&gt;If you don't want to use the &lt;CODE&gt;csv&lt;/CODE&gt; sourcetype, you may need to place a &lt;CODE&gt;props.conf&lt;/CODE&gt; file on your indexer that explicitly sets the parsing rules for the sourcetype that you choose.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jan 2015 00:11:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-some-data-being-indexed-as-separate-entries-while/m-p/170489#M34469</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2015-01-05T00:11:14Z</dc:date>
    </item>
  </channel>
</rss>

