Hi Splunkers, a colleague team si facing some issues related to .csv file collection. Let me share the required context.
We have a .csv file that is sent to a sftp server. The sending is 1 per day: this means that every day, the file is write once and never modified. In addiction to this, even if the file is a csv one, it has a .log extension.
On this server, the Splunk UF is installed and configured to read this daily file.
What currently happen is the following:
INFO WatchedFile [23227 tailreader0] - File too small to check seekcrc, probably truncated. Will re-read entire file=<file name here> can be got from internal logs
The csv header is viewed like an event. This means that, for example, the file contains 1000 events, performing a search in assigned index we have 1000 + x events; each of this x events does not contains real events, but the csv header file. So, we see the header as an event/logs.
For the first problem, I suggested to my team to use the initCrcLength parameter, properly set.
For the second one, I shared them to ensure that following parameter are set:
INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
In addition to this, I suggested them to avoid the default line breaker; in the inputs.conf file is set the following one:
LINE_BREAKER = ([\r\n]+)
That could be the root cause/one of the cause of header extraction as events.
I don't know if those changes has fixed the events (they are still performing required restarts), but I would ask you if any other possible fix should be applied.
Thanks!
Hi!
here below what you have requested:
props.conf
[GDPR_ZUORA]
SHOULD_LINEMERGE=false
#LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type=true
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
#SHOULD_LINEMERGE = false
#FIELD_DELIMITER = ,
#FIELD_NAMES = date,hostname,app,action,ObjectName,user,operation,value_before,value_after,op_target,description
inputs.conf
[monitor:///sftp/Zuora/LOG-Zuora-*.log]
disabled = false
index = sftp_compliance
sourcetype = GDPR_ZUORA
source = GDPR_ZUORA
initCrcLength = 256
First 2 lines of the file monitored:
DataOra,ServerSorgente,Applicazione,TipoAzione,TipologiaOperazione,ServerDestinazione,UserID,UserName,OldValue,NewValue,Note
2025-06-05T23:22:01.157Z,,Zuora,Tenant Property,UPDATED,,3,ScheduledJobUser,2025-06-04T22:07:09.005473Z,2025-06-05T22:21:30.642092Z,BIN_DATA_UPDATE_FROM
I think the message about re-reading the file shouldnt be an issue in your case.
You mentioned setting LINE_BREAKER in inputs.conf, however this should be in props.conf - having said that - I think the default should be sufficient for your CSV file.
If you set HEADER_FIELD_LINE_NUMBER=0 (default) do you get the same results?
What does the first line with the headers look like, is it a typical comma (,) separated list of headers? No quotes, spaces,tabs etc etc? If so the default FIELD_DELIMITER should suffice but want to check.
I'm not 100% sure I follow what you mean about the headers, do you mean that for each event you also see the header printed?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing