Hello to everyone!
I planned to ingest *.csv files using Universal Forwarder from Windows Server 2019 in batch mode.
It sounds pretty trivial, but I collided with the problem.
After an appearance of a new file, I observe new events through a search head, and in the end, expecting that file will be deleted by Splunk UF, but the file is still remaining.
It seemed that the problem was related to file access, but I can't find any related errors in the logs of this UF instance.
So, what can be the root of this behavior?
inputs.conf
[batch://C:\ProgramData\ScriptLog\spl_export_vmtools_status\vmtools_stats_*.csv]
disabled = false
index = vsi
crcSalt = <SOURCE>
move_policy = sinkhole
sourcetype = vsi_file_vmtools-stats
props.conf
[vsi_file_vmtools-stats]
ANNOTATE_PUNCT = false
BREAK_ONLY_BEFORE_DATE = true
INDEXED_EXTRACTIONS = CSV
HEADER_FIELD_LINE_NUMBER = 1
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = Time
time_before_close = <integer>
* The amount of time, in seconds, that the file monitor must wait for
modifications before closing a file after reaching an End-of-File
(EOF) marker.
* Tells the input not to close files that have been updated in the
past 'time_before_close' seconds.
* Default: 3
Is it possible that the file is not producing an EOF marker? or that something keeps chatting to the file?
@dural_yyzYou're quoting from the monitor input spec, @NoSpaces is asking about batch input.
@NoSpacesThere are two reasons that file might not get deleted even if it theoretically should.
1. Permissions - check that the user splunkd.exe runs with has proper permissions to the directory and log files
2. Locking - if the file stays opened for writing, Splunk might not be able to remove it.
Unfortunately while monitor input is typically relatively verbose about its problems with reading files, I'm not sure about how batch input behaves in that regard.
@PickleRick wrote:@dural_yyzYou're quoting from the monitor input spec, @NoSpaces is asking about batch input.
Yes - admittedly I'm not as familiar with batch so I verified with the docs where I found this under batch. I guess they did not write out the config definitions twice but just did a reference back to how it works with monitor.
# The following settings work identically as for [monitor::] stanzas,
# documented previously
host_regex = <regular expression>
host_segment = <integer>
crcSalt = <string>
recursive = <boolean>
whitelist = <regular expression>
blacklist = <regular expression>
initCrcLength = <integer>
time_before_close = <integer>
@PickleRick answers are more likely a good place to start. It's possible you can did into the default debug logging levels at the UF but I wouldn't start with trying to increase logging until exhausting all other options first.
@dural_yyzYou got me here. I was pretty sure this setting was for monitor input only. But come to think of it, it makes sense in batch context as well (you don't want to batch-read a file while it's still being - for example - rsynced from remote).
But that should not change much in terms of deleting files. I reckon it could only make Splunk end reading prematurely. But together with a lock on an open file from another process that could make file undeletable (windows has different concurrent access paradigm than unices).
Thank you for your thoughts, colleagues.
I will check the idea of @dural_yyz that mentioned the absence of an EOF marker.
@PickleRick , talking about permission, I'm pretty sure that this is not the case because about a month ago I found out that new Splunk UFs started to use "USE_LOCAL_SYSTEM = 0" by default during silent install.
Because of it I was observing something like this on the affected UF instances:
10-27-2024 21:50:16.756 +0300 ERROR TailReader [3644 tailreader0] - Unable to remove sinkhole file: path=E:\path\file.xml, errno=Access is denied.