<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why Splunk can't index very large csv files in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341177#M62887</link>
    <description>&lt;P&gt;Hmmm.  Those May and June numbers are bizarrely out of whack with the rest. May got near 100% indexed, and June about 43%.  That's probably NOT a clue, but I'd keep it in mind while looking at everything else.&lt;/P&gt;

&lt;P&gt;I'd do the same thing again, putting the results into two different temporary indexes.  If the resultant load numbers  for the full file are not identical to the first results, then I'd look at memory usage and so on.   &lt;/P&gt;

&lt;P&gt;Next, I'd &lt;CODE&gt;diff&lt;/CODE&gt; the full results against the partial load results to see which records were dropped.  &lt;/P&gt;

&lt;P&gt;Finally, I might set up two different sourcetypes, and set one to send any records before April 1 to the null queue, and the other to send any after March 31 to the null queue, and see whether they successfully loaded all the appropriate records. &lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;&lt;CODE&gt;Truncate&lt;/CODE&gt; setting in props.conf is for each line, so that's not relevant.  &lt;/P&gt;

&lt;P&gt;Check this one here for the notes on the TRUNCATE setting.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://answers.splunk.com/answers/80146/splunk-search-of-indexed-csv-file-does-not-pull-out-all-the-fields.html"&gt;https://answers.splunk.com/answers/80146/splunk-search-of-indexed-csv-file-does-not-pull-out-all-the-fields.html&lt;/A&gt;&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;&lt;CODE&gt;max_mem_usage_mb&lt;/CODE&gt; in limits.conf affects searches, apparently not indexing, so that's probably not it.&lt;/P&gt;</description>
    <pubDate>Wed, 02 Aug 2017 14:29:54 GMT</pubDate>
    <dc:creator>DalJeanis</dc:creator>
    <dc:date>2017-08-02T14:29:54Z</dc:date>
    <item>
      <title>Why Splunk can't index very large csv files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341176#M62886</link>
      <description>&lt;P&gt;I am using a csv file to input data in my local Splunk Enterprise.&lt;BR /&gt;
I have a very big csv file that is around 100mb.&lt;/P&gt;

&lt;P&gt;The data in my csv file contains the following count of events:&lt;BR /&gt;
January:  36,055&lt;BR /&gt;
February: 37,613&lt;BR /&gt;
March: 41,521&lt;BR /&gt;
April: 33,697&lt;BR /&gt;
May : 39,980&lt;BR /&gt;
June: 36,994&lt;BR /&gt;
July: 31,963&lt;/P&gt;

&lt;P&gt;After loading the data into Splunk, the data in Splunk contains the following count of events:&lt;BR /&gt;
January:  29,416&lt;BR /&gt;
February: 32,042&lt;BR /&gt;
March: 37,516&lt;BR /&gt;
April: 33,458&lt;BR /&gt;
May : 39,975&lt;BR /&gt;
June: 15,935&lt;BR /&gt;
July: 22,766&lt;/P&gt;

&lt;P&gt;Note: My index usage is only 243MB/488.28GB&lt;/P&gt;

&lt;P&gt;I tried cutting my csv file to only May June and July data and uploaded it to Splunk.&lt;BR /&gt;
csv count:&lt;BR /&gt;
May : 39,980&lt;BR /&gt;
June: 36,994&lt;BR /&gt;
July: 31,963&lt;/P&gt;

&lt;P&gt;Splunk count:&lt;BR /&gt;
May : 39,980&lt;BR /&gt;
June: 36,994&lt;BR /&gt;
July: 31,963&lt;/P&gt;

&lt;P&gt;So this means I have no problem with the formatting of the timestamp in my csv file.&lt;/P&gt;

&lt;P&gt;Could you help me find the configuration that causes this truncation?&lt;BR /&gt;
or atleast help me on how to investigate it?&lt;BR /&gt;
I will appreciate any response regarding the matter.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Aug 2017 13:20:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341176#M62886</guid>
      <dc:creator>bonnlbbelandres</dc:creator>
      <dc:date>2017-08-02T13:20:52Z</dc:date>
    </item>
    <item>
      <title>Re: Why Splunk can't index very large csv files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341177#M62887</link>
      <description>&lt;P&gt;Hmmm.  Those May and June numbers are bizarrely out of whack with the rest. May got near 100% indexed, and June about 43%.  That's probably NOT a clue, but I'd keep it in mind while looking at everything else.&lt;/P&gt;

&lt;P&gt;I'd do the same thing again, putting the results into two different temporary indexes.  If the resultant load numbers  for the full file are not identical to the first results, then I'd look at memory usage and so on.   &lt;/P&gt;

&lt;P&gt;Next, I'd &lt;CODE&gt;diff&lt;/CODE&gt; the full results against the partial load results to see which records were dropped.  &lt;/P&gt;

&lt;P&gt;Finally, I might set up two different sourcetypes, and set one to send any records before April 1 to the null queue, and the other to send any after March 31 to the null queue, and see whether they successfully loaded all the appropriate records. &lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;&lt;CODE&gt;Truncate&lt;/CODE&gt; setting in props.conf is for each line, so that's not relevant.  &lt;/P&gt;

&lt;P&gt;Check this one here for the notes on the TRUNCATE setting.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://answers.splunk.com/answers/80146/splunk-search-of-indexed-csv-file-does-not-pull-out-all-the-fields.html"&gt;https://answers.splunk.com/answers/80146/splunk-search-of-indexed-csv-file-does-not-pull-out-all-the-fields.html&lt;/A&gt;&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;&lt;CODE&gt;max_mem_usage_mb&lt;/CODE&gt; in limits.conf affects searches, apparently not indexing, so that's probably not it.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Aug 2017 14:29:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341177#M62887</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-08-02T14:29:54Z</dc:date>
    </item>
    <item>
      <title>Re: Why Splunk can't index very large csv files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341178#M62888</link>
      <description>&lt;P&gt;My suspicion is that you have a malformed CSV (missing/extra commans, merged lines, etc.).  How are you sending this CSV to Splunk?   Why are you not using it as a &lt;CODE&gt;lookup&lt;/CODE&gt; instead (how often does it change)?&lt;/P&gt;</description>
      <pubDate>Wed, 02 Aug 2017 14:35:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-Splunk-can-t-index-very-large-csv-files/m-p/341178#M62888</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-08-02T14:35:06Z</dc:date>
    </item>
  </channel>
</rss>

