FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME

gedworksplunk · ‎07-06-2017

Hi,

Using Splunk 6.5.1 with either directing monitoring and indexing and search on a single machine,
or using a dedicated forwarder feeding the indexer/search head machine.

I've setup a monitoring of a directory where some binary updates a CSV file all day long:
2017.07.06.jobs
That CSV file has 31 fields on each line like:

FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME

For the sourcetype, I'm using the built-in "csv" complemented with a TIMESTAMP_FIELDS = SUBMITTIME.

The data loaded in my index is corrupted: I am seeing that sometimes a line is only half-read, so only the first half of the fields is populated. But then, the second-half of the line is treated as a new line with the first half of the fields being populated with the second half of the fields: aka: I see some EXECHOST name values in the PROJECT field.

I cannot find any warning of interest in the splunkd.log file,
apart maybe from:
07-06-2017 11:36:53.585 -0700 INFO WatchedFile - Resetting fd to re-extract header.

Any ideas?

richgalloway · ‎07-06-2017

Since the 'Resetting fd' message is info-level, it's probably not a big deal, but you may want to try putting a FIELDS attribute in props.conf to see if it keeps Splunk from re-reading the header.

As for the partial events, I've seen that happen with multi-line events where the extra lines took a while to write. Adjusting the time_before_close setting usually helps with that. Hard to believe it would take 3 seconds for your app to write a single line, though.

---
If this reply helps you, Karma would be appreciated.

gedworksplunk · ‎07-14-2017

Hi, the FIELD_NAMES = in props.conf did fix that message in the splunkd.log.

I've also tried to increase the time_before_close up to 65, and I am still seeing corrupted lines being read.

Problem with CSV file monitoring importing corrupted data: end-of-line not respected

FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Calling All Security Pros: Ready to Race Through Boston?

Are you a member of the Splunk Community?

Problem with CSV file monitoring importing corrupted data: end-of-line not respected

FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Calling All Security Pros: Ready to Race Through Boston?