Solved: Indexing CSV file that changes daily

dchodur · ‎08-05-2014

I have been looking all over answers and trying various things and not getting the to bottom of this issue.
I have a CSV that is generated by a backup reporting tool. One of the files is failed backups. Data can look like this, with obvious date changes each day:

    Started,Client,Server,Domain Name,Media Server,Group,Schedule,Job,Status,Error Code,Error Code Summary,Status Code,Status Code Summary,Level,Size Scanned (MB),Size (MB),Files,Num Files Not Backed Up,Queued,Finished,Duration (second),Throughput (MB/sec),Retention (week),Expires,Backup Application,Size Offset (B),Size Scanned Offset (B),Size Transferred (MB),Size Transferred Offset (B),Effective Path,Plugin Name,Bytes Modified Sent (B),Bytes Modified Not Sent (B)
8/4/14 6:00 PM,sqlreport01.rainhail.com,avamar.rainhail.com,/database,avamar.rainhail.com,Database MSSQL File System Windows 2003,6pm Start 11 Hour Window,Windows File System-ALL,failed,"10,002",Command failed: Invalid command line flags,"30,999",Activity failed - client error(s).,Full,0,0,0,0,8/4/14 6:00 PM,8/4/14 6:00 PM,1 second,0,1 week 6 days,8/18/14 6:00 PM,avamar,0,0,0,0,/Windows MSSQL File System 2003,Windows File System,0,0

Each day the file is updated with the current failed jobs. Splunk will not index the file without restarting the SplunkUF service. I have tried adjusting the CRC length and salt and does not seem to make a difference. Also adjusted the check method to be modtime, no difference.

What is the best way to make splunk index this file each time it changes regardless of the content in it.

Here is the input.conf on UF:

[monitor://c:\rhllcprocs\clientsfailedbackups.csv]
disabled = false
followtail= 0
sourcetype = CSVfileDPAFailedBackups
index = monitorit

Here is the props.conf on UF:

[CSVfileDPAFailedBackups]
INDEXED_EXTRACTIONS=CSV
FIELD_DELIMITER=,
TIMESTAMP_FIELDS="Start Time"
HEADER_FIELD_LINE_NUMBER=1

[source::c:\\rhllcprocs\\clientsfailedbackups.csv]
CHECK_METHOD = modtime

Splunk log around file change:

08-05-2014 07:30:01.321 -0500 INFO  WatchedFile - Will begin reading at offset=0 for file='c:\rhllcprocs\clientsfailedbackups.csv'.
08-05-2014 07:30:01.322 -0500 INFO  WatchedFile - Resetting fd to re-extract header.

Thanks!

dchodur · ‎08-06-2014

I think the initCRCLength would have worked all along but I finally discovered some other props.conf settings in one of the files on my indexer that was trumping my configs. I think this has been my issue all along. I also set this data to come in not a CSV file and dropped the heading and added my own field extractions. I have had various issues in the past with CSV files themselves and like to do this to just have the raw comma separated data without the header and set the field extractions myself. Doing this removes the long headers which was surely running into the CRC length. Had I done nothing and not had the extra conflicting configs, the CRC length would probably have fixed it. Thanks.

View solution in original post

dchodur · ‎08-06-2014

I think the initCRCLength would have worked all along but I finally discovered some other props.conf settings in one of the files on my indexer that was trumping my configs. I think this has been my issue all along. I also set this data to come in not a CSV file and dropped the heading and added my own field extractions. I have had various issues in the past with CSV files themselves and like to do this to just have the raw comma separated data without the header and set the field extractions myself. Doing this removes the long headers which was surely running into the CRC length. Had I done nothing and not had the extra conflicting configs, the CRC length would probably have fixed it. Thanks.

derekarnold · ‎08-05-2014

Yes initCrcLength is the correct answer:

initCrcLength =
* This setting adjusts how much of a file Splunk reads before trying to identify whether it is a file that has
already been seen. You may want to adjust this if you have many files with common headers (comment headers,
long CSV headers, etc) and recurring filenames.
* CAUTION: Improper use of this setting will cause data to be reindexed. You may wish to consult with Splunk
Support before adjusting this value - the default is fine for most installations.
* Defaults to 256 (bytes).
* Must be in the range 256-1048576.

somesoni2 · ‎08-05-2014

Try setting initCrcLength to some higher values like 500 in inputs.conf.

Indexing CSV file that changes daily

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Inside Event Intelligence: How ITSI Turns Network Alerts into Actionable Incidents

Observability Simplified: Combining User Experience, Application Performance & ...

Join the Conversation