Getting Data In

Issues with csv Splunk File Monitoring

SplunkExplorer
Contributor

Hi Splunkers, a colleague team si facing some issues related to .csv file collection. Let me share  the required context.

We have a .csv file that is sent to a sftp server. The sending is 1 per day: this means that every day, the file is write once and never modified. In addiction to this, even if the file is a csv one, it has a .log extension.

On this server, the Splunk UF is installed and configured to read this daily file.

What currently happen is the following:

  1. The file is read many time: multiple occurrence of error message like: 

    INFO  WatchedFile [23227 tailreader0] - File too small to check seekcrc, probably truncated.  Will re-read entire file=<file name here> can be got from internal logs

  2.  

    The csv header is viewed like an event. This means that, for example, the file contains 1000 events, performing a search in assigned index we have 1000 + x  events; each of this x events does not contains real events, but the csv header file. So, we see the header as an event/logs.

For the first problem, I suggested to my team to use the initCrcLength parameter, properly set.
For the second one, I shared them to ensure that following parameter are set:

INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
 

In addition to this, I suggested them to avoid the default line breaker; in the inputs.conf file is set the following one:

 LINE_BREAKER = ([\r\n]+)

That could be the root cause/one of the cause of header extraction as events.

I don't know if those changes has fixed the events (they are still performing required restarts), but I would ask you if any other possible fix should be applied.

Thanks!

Labels (4)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
Can you show current inputs.conf and props.conf stanzas for this CSV file?
And example (modified) from 1st 2 lines (header + real masked events) from that file?
0 Karma

marsantamaria
New Member

Hi! 

here below what you have requested: 

props.conf
[GDPR_ZUORA]
SHOULD_LINEMERGE=false
#LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type=true
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
#SHOULD_LINEMERGE = false
#FIELD_DELIMITER = ,
#FIELD_NAMES = date,hostname,app,action,ObjectName,user,operation,value_before,value_after,op_target,description

inputs.conf
[monitor:///sftp/Zuora/LOG-Zuora-*.log]
disabled = false
index = sftp_compliance
sourcetype = GDPR_ZUORA
source = GDPR_ZUORA
initCrcLength = 256

First 2 lines of the file monitored:
DataOra,ServerSorgente,Applicazione,TipoAzione,TipologiaOperazione,ServerDestinazione,UserID,UserName,OldValue,NewValue,Note
2025-06-05T23:22:01.157Z,,Zuora,Tenant Property,UPDATED,,3,ScheduledJobUser,2025-06-04T22:07:09.005473Z,2025-06-05T22:21:30.642092Z,BIN_DATA_UPDATE_FROM

 

0 Karma

livehybrid
Ultra Champion

Hi @SplunkExplorer 

I think the message about re-reading the file shouldnt be an issue in your case.

You mentioned setting LINE_BREAKER in inputs.conf, however this should be in props.conf - having said that - I think the default should be sufficient for your CSV file.

If you set HEADER_FIELD_LINE_NUMBER=0 (default) do you get the same results?

What does the first line with the headers look like, is it a typical comma (,) separated list of headers? No quotes, spaces,tabs etc etc? If so the default FIELD_DELIMITER should suffice but want to check.

I'm not 100% sure I follow what you mean about the headers, do you mean that for each event you also see the header printed?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

Operationalizing TDIR: Building a More Resilient, Scalable SOC

Optimizing SOC workflows with a unified, risk-based approach to Threat Detection, Investigation, and Response ...

Pro Tips for First-Time .conf Attendees: Advice from SplunkTrust

Heading to your first .Conf? You’re in for an unforgettable ride — learning, networking, swag collecting, ...

Raise Your Skills at the .conf25 Builder Bar: Your Splunk Developer Destination

Calling all Splunk developers, custom SPL builders, dashboarders, and Splunkbase app creators – the Builder ...