Getting Data In

Why Splunk can't index very large csv files

bonnlbbelandres
Path Finder

I am using a csv file to input data in my local Splunk Enterprise.
I have a very big csv file that is around 100mb.

The data in my csv file contains the following count of events:
January: 36,055
February: 37,613
March: 41,521
April: 33,697
May : 39,980
June: 36,994
July: 31,963

After loading the data into Splunk, the data in Splunk contains the following count of events:
January: 29,416
February: 32,042
March: 37,516
April: 33,458
May : 39,975
June: 15,935
July: 22,766

Note: My index usage is only 243MB/488.28GB

I tried cutting my csv file to only May June and July data and uploaded it to Splunk.
csv count:
May : 39,980
June: 36,994
July: 31,963

Splunk count:
May : 39,980
June: 36,994
July: 31,963

So this means I have no problem with the formatting of the timestamp in my csv file.

Could you help me find the configuration that causes this truncation?
or atleast help me on how to investigate it?
I will appreciate any response regarding the matter.

woodcock
Esteemed Legend

My suspicion is that you have a malformed CSV (missing/extra commans, merged lines, etc.). How are you sending this CSV to Splunk? Why are you not using it as a lookup instead (how often does it change)?

DalJeanis
Legend

Hmmm. Those May and June numbers are bizarrely out of whack with the rest. May got near 100% indexed, and June about 43%. That's probably NOT a clue, but I'd keep it in mind while looking at everything else.

I'd do the same thing again, putting the results into two different temporary indexes. If the resultant load numbers for the full file are not identical to the first results, then I'd look at memory usage and so on.

Next, I'd diff the full results against the partial load results to see which records were dropped.

Finally, I might set up two different sourcetypes, and set one to send any records before April 1 to the null queue, and the other to send any after March 31 to the null queue, and see whether they successfully loaded all the appropriate records.


Truncate setting in props.conf is for each line, so that's not relevant.

Check this one here for the notes on the TRUNCATE setting.

https://answers.splunk.com/answers/80146/splunk-search-of-indexed-csv-file-does-not-pull-out-all-the...


max_mem_usage_mb in limits.conf affects searches, apparently not indexing, so that's probably not it.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...