Getting Data In

Issues with csv Splunk File Monitoring

SplunkExplorer
Contributor

Hi Splunkers, a colleague team si facing some issues related to .csv file collection. Let me share  the required context.

We have a .csv file that is sent to a sftp server. The sending is 1 per day: this means that every day, the file is write once and never modified. In addiction to this, even if the file is a csv one, it has a .log extension.

On this server, the Splunk UF is installed and configured to read this daily file.

What currently happen is the following:

  1. The file is read many time: multiple occurrence of error message like: 

    INFO  WatchedFile [23227 tailreader0] - File too small to check seekcrc, probably truncated.  Will re-read entire file=<file name here> can be got from internal logs

  2.  

    The csv header is viewed like an event. This means that, for example, the file contains 1000 events, performing a search in assigned index we have 1000 + x  events; each of this x events does not contains real events, but the csv header file. So, we see the header as an event/logs.

For the first problem, I suggested to my team to use the initCrcLength parameter, properly set.
For the second one, I shared them to ensure that following parameter are set:

INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
 

In addition to this, I suggested them to avoid the default line breaker; in the inputs.conf file is set the following one:

 LINE_BREAKER = ([\r\n]+)

That could be the root cause/one of the cause of header extraction as events.

I don't know if those changes has fixed the events (they are still performing required restarts), but I would ask you if any other possible fix should be applied.

Thanks!

Labels (4)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
Can you show current inputs.conf and props.conf stanzas for this CSV file?
And example (modified) from 1st 2 lines (header + real masked events) from that file?
0 Karma

marsantamaria
New Member

Hi! 

here below what you have requested: 

props.conf
[GDPR_ZUORA]
SHOULD_LINEMERGE=false
#LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type=true
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
#SHOULD_LINEMERGE = false
#FIELD_DELIMITER = ,
#FIELD_NAMES = date,hostname,app,action,ObjectName,user,operation,value_before,value_after,op_target,description

inputs.conf
[monitor:///sftp/Zuora/LOG-Zuora-*.log]
disabled = false
index = sftp_compliance
sourcetype = GDPR_ZUORA
source = GDPR_ZUORA
initCrcLength = 256

First 2 lines of the file monitored:
DataOra,ServerSorgente,Applicazione,TipoAzione,TipologiaOperazione,ServerDestinazione,UserID,UserName,OldValue,NewValue,Note
2025-06-05T23:22:01.157Z,,Zuora,Tenant Property,UPDATED,,3,ScheduledJobUser,2025-06-04T22:07:09.005473Z,2025-06-05T22:21:30.642092Z,BIN_DATA_UPDATE_FROM

 

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @SplunkExplorer 

I think the message about re-reading the file shouldnt be an issue in your case.

You mentioned setting LINE_BREAKER in inputs.conf, however this should be in props.conf - having said that - I think the default should be sufficient for your CSV file.

If you set HEADER_FIELD_LINE_NUMBER=0 (default) do you get the same results?

What does the first line with the headers look like, is it a typical comma (,) separated list of headers? No quotes, spaces,tabs etc etc? If so the default FIELD_DELIMITER should suffice but want to check.

I'm not 100% sure I follow what you mean about the headers, do you mean that for each event you also see the header printed?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

 Prepare to elevate your security operations with the powerful upgrade to Splunk Enterprise Security 8.x! This ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Passionate about security automation? Apply now to our AI Playbook Authoring Alpha private preview ...

Reduce and Transform Your Firewall Data with Splunk Data Management

Managing high-volume firewall data has always been a challenge. Noisy events and verbose traffic logs often ...