Getting Data In

Issues with csv Splunk File Monitoring

SplunkExplorer
Contributor

Hi Splunkers, a colleague team si facing some issues related to .csv file collection. Let me share  the required context.

We have a .csv file that is sent to a sftp server. The sending is 1 per day: this means that every day, the file is write once and never modified. In addiction to this, even if the file is a csv one, it has a .log extension.

On this server, the Splunk UF is installed and configured to read this daily file.

What currently happen is the following:

  1. The file is read many time: multiple occurrence of error message like: 

    INFO  WatchedFile [23227 tailreader0] - File too small to check seekcrc, probably truncated.  Will re-read entire file=<file name here> can be got from internal logs

  2.  

    The csv header is viewed like an event. This means that, for example, the file contains 1000 events, performing a search in assigned index we have 1000 + x  events; each of this x events does not contains real events, but the csv header file. So, we see the header as an event/logs.

For the first problem, I suggested to my team to use the initCrcLength parameter, properly set.
For the second one, I shared them to ensure that following parameter are set:

INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
 

In addition to this, I suggested them to avoid the default line breaker; in the inputs.conf file is set the following one:

 LINE_BREAKER = ([\r\n]+)

That could be the root cause/one of the cause of header extraction as events.

I don't know if those changes has fixed the events (they are still performing required restarts), but I would ask you if any other possible fix should be applied.

Thanks!

Labels (4)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
Can you show current inputs.conf and props.conf stanzas for this CSV file?
And example (modified) from 1st 2 lines (header + real masked events) from that file?
0 Karma

marsantamaria
New Member

Hi! 

here below what you have requested: 

props.conf
[GDPR_ZUORA]
SHOULD_LINEMERGE=false
#LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type=true
HEADER_FIELD_LINE_NUMBER = 1
CHECK_FOR_HEADER = true
#SHOULD_LINEMERGE = false
#FIELD_DELIMITER = ,
#FIELD_NAMES = date,hostname,app,action,ObjectName,user,operation,value_before,value_after,op_target,description

inputs.conf
[monitor:///sftp/Zuora/LOG-Zuora-*.log]
disabled = false
index = sftp_compliance
sourcetype = GDPR_ZUORA
source = GDPR_ZUORA
initCrcLength = 256

First 2 lines of the file monitored:
DataOra,ServerSorgente,Applicazione,TipoAzione,TipologiaOperazione,ServerDestinazione,UserID,UserName,OldValue,NewValue,Note
2025-06-05T23:22:01.157Z,,Zuora,Tenant Property,UPDATED,,3,ScheduledJobUser,2025-06-04T22:07:09.005473Z,2025-06-05T22:21:30.642092Z,BIN_DATA_UPDATE_FROM

 

0 Karma

livehybrid
Super Champion

Hi @SplunkExplorer 

I think the message about re-reading the file shouldnt be an issue in your case.

You mentioned setting LINE_BREAKER in inputs.conf, however this should be in props.conf - having said that - I think the default should be sufficient for your CSV file.

If you set HEADER_FIELD_LINE_NUMBER=0 (default) do you get the same results?

What does the first line with the headers look like, is it a typical comma (,) separated list of headers? No quotes, spaces,tabs etc etc? If so the default FIELD_DELIMITER should suffice but want to check.

I'm not 100% sure I follow what you mean about the headers, do you mean that for each event you also see the header printed?

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...