Getting Data In

Problem with CSV file monitoring importing corrupted data: end-of-line not respected

gedworksplunk
Engager

Hi,

Using Splunk 6.5.1 with either directing monitoring and indexing and search on a single machine,
or using a dedicated forwarder feeding the indexer/search head machine.

I've setup a monitoring of a directory where some binary updates a CSV file all day long:
2017.07.06.jobs
That CSV file has 31 fields on each line like:

FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME

For the sourcetype, I'm using the built-in "csv" complemented with a TIMESTAMP_FIELDS = SUBMITTIME.

The data loaded in my index is corrupted: I am seeing that sometimes a line is only half-read, so only the first half of the fields is populated. But then, the second-half of the line is treated as a new line with the first half of the fields being populated with the second half of the fields: aka: I see some EXECHOST name values in the PROJECT field.

I cannot find any warning of interest in the splunkd.log file,
apart maybe from:
07-06-2017 11:36:53.585 -0700 INFO WatchedFile - Resetting fd to re-extract header.

Any ideas?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Since the 'Resetting fd' message is info-level, it's probably not a big deal, but you may want to try putting a FIELDS attribute in props.conf to see if it keeps Splunk from re-reading the header.

As for the partial events, I've seen that happen with multi-line events where the extra lines took a while to write. Adjusting the time_before_close setting usually helps with that. Hard to believe it would take 3 seconds for your app to write a single line, though.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gedworksplunk
Engager

Hi, the FIELD_NAMES = in props.conf did fix that message in the splunkd.log.

I've also tried to increase the time_before_close up to 65, and I am still seeing corrupted lines being read.

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...