Hi Splunkers, a colleague team si facing some issues related to .csv file collection. Let me share the required context. We have a .csv file that is sent to a sftp server. The sending is 1 per day:...
See more...
Hi Splunkers, a colleague team si facing some issues related to .csv file collection. Let me share the required context. We have a .csv file that is sent to a sftp server. The sending is 1 per day: this means that every day, the file is write once and never modified. In addiction to this, even if the file is a csv one, it has a .log extension. On this server, the Splunk UF is installed and configured to read this daily file. What currently happen is the following: The file is read many time: multiple occurrence of error message like: INFO WatchedFile [23227 tailreader0] - File too small to check seekcrc, probably truncated. Will re-read entire file=<file name here> can be got from internal logs The csv header is viewed like an event. This means that, for example, the file contains 1000 events, performing a search in assigned index we have 1000 + x events; each of this x events does not contains real events, but the csv header file. So, we see the header as an event/logs. For the first problem, I suggested to my team to use the initCrcLength parameter, properly set. For the second one, I shared them to ensure that following parameter are set: INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
In addition to this, I suggested them to avoid the default line breaker; in the inputs.conf file is set the following one: LINE_BREAKER = ([\r\n]+) That could be the root cause/one of the cause of header extraction as events. I don't know if those changes has fixed the events (they are still performing required restarts), but I would ask you if any other possible fix should be applied. Thanks!