I am using a csv file to input data in my local Splunk Enterprise.
I have a very big csv file that is around 100mb.
The data in my csv file contains the following count of events:
January: 36,055
February: 37,613
March: 41,521
April: 33,697
May : 39,980
June: 36,994
July: 31,963
After loading the data into Splunk, the data in Splunk contains the following count of events:
January: 29,416
February: 32,042
March: 37,516
April: 33,458
May : 39,975
June: 15,935
July: 22,766
Note: My index usage is only 243MB/488.28GB
I tried cutting my csv file to only May June and July data and uploaded it to Splunk.
csv count:
May : 39,980
June: 36,994
July: 31,963
Splunk count:
May : 39,980
June: 36,994
July: 31,963
So this means I have no problem with the formatting of the timestamp in my csv file.
Could you help me find the configuration that causes this truncation?
or atleast help me on how to investigate it?
I will appreciate any response regarding the matter.
My suspicion is that you have a malformed CSV (missing/extra commans, merged lines, etc.). How are you sending this CSV to Splunk? Why are you not using it as a lookup
instead (how often does it change)?
Hmmm. Those May and June numbers are bizarrely out of whack with the rest. May got near 100% indexed, and June about 43%. That's probably NOT a clue, but I'd keep it in mind while looking at everything else.
I'd do the same thing again, putting the results into two different temporary indexes. If the resultant load numbers for the full file are not identical to the first results, then I'd look at memory usage and so on.
Next, I'd diff
the full results against the partial load results to see which records were dropped.
Finally, I might set up two different sourcetypes, and set one to send any records before April 1 to the null queue, and the other to send any after March 31 to the null queue, and see whether they successfully loaded all the appropriate records.
Truncate
setting in props.conf is for each line, so that's not relevant.
Check this one here for the notes on the TRUNCATE setting.
max_mem_usage_mb
in limits.conf affects searches, apparently not indexing, so that's probably not it.