Hi,
Using Splunk 6.5.1 with either directing monitoring and indexing and search on a single machine,
or using a dedicated forwarder feeding the indexer/search head machine.
I've setup a monitoring of a directory where some binary updates a CSV file all day long:
2017.07.06.jobs
That CSV file has 31 fields on each line like:
FIELDS: ID,PROJECT,USER,OSGROUP,DIR,ENV,TOOL,JOBNAME,PRIORITY,RESOURCES,SUBMITHOST,EXECHOST,SUBMITTIME,STARTTIME,ENDTIME
For the sourcetype, I'm using the built-in "csv" complemented with a TIMESTAMP_FIELDS = SUBMITTIME.
The data loaded in my index is corrupted: I am seeing that sometimes a line is only half-read, so only the first half of the fields is populated. But then, the second-half of the line is treated as a new line with the first half of the fields being populated with the second half of the fields: aka: I see some EXECHOST name values in the PROJECT field.
I cannot find any warning of interest in the splunkd.log file,
apart maybe from:
07-06-2017 11:36:53.585 -0700 INFO WatchedFile - Resetting fd to re-extract header.
Any ideas?
... View more