Solved: Splunk forwarder not releasing files

jrwebst · ‎07-07-2017

All,

I am trying to figure out if there is a setting I may have missed somewhere or if this is just a Splunk problem. We have an application running that the Splunk forwarder is monitoring the log for. We needed to reclaim some space disk space so we restarted the application and deleted the current log, and the application rolled the new log out. However, Splunk did not release the deleted file until we restarted the forwarder per LSOF. I saw one other question about this from a few years back but it was never answered. Does anyone have any insight into this? Any help would be greatly appreciated.

dwaddle · ‎07-09-2017

The base behavior here is fundamental Unix. When any process calls the unlink() system call to remove a file, the blocks allocated to the file remain allocated until all processes that have the file open have closed it.

Typically this is not a big problem for Splunk as Splunk tends to open files for a very short window (a few seconds) and then close them when there is a brief window of idleness detected on the file. This idleness window is controlled by the TIME_BEFORE_CLOSE option in inputs.conf. Splunk can keep a file open for a very long time in a few edge cases:

If the TIME_BEFORE_CLOSE setting is set incredibly high for the file in question
If the file is active enough that it never has any "quiet time"
If the file has a large amount of data that needs to be processed, and it has been kicked over to a BatchReader thread in response

Usually, I would say that the BatchReader issue is most likely the problem in this case. There are a couple of things to tune to help with this, like maxKBps in limits.conf (to allow the forwarder to output more data at once to the indexers), and parallelPipelineCount in server.conf (to allow more threads to process data in parallel).

I would expect that in most cases Splunk would fairly quickly (minutes) read to the end of the deleted file, and then close it. At which time, the kernel would release all of the filesystem blocks allocated for the file. By stopping Splunk, you force it to close the file perhaps early, and probably caused some events to be lost. (Which may be something you don't care about)

View solution in original post

dwaddle · ‎07-09-2017

The base behavior here is fundamental Unix. When any process calls the unlink() system call to remove a file, the blocks allocated to the file remain allocated until all processes that have the file open have closed it.

Typically this is not a big problem for Splunk as Splunk tends to open files for a very short window (a few seconds) and then close them when there is a brief window of idleness detected on the file. This idleness window is controlled by the TIME_BEFORE_CLOSE option in inputs.conf. Splunk can keep a file open for a very long time in a few edge cases:

If the TIME_BEFORE_CLOSE setting is set incredibly high for the file in question
If the file is active enough that it never has any "quiet time"
If the file has a large amount of data that needs to be processed, and it has been kicked over to a BatchReader thread in response

Usually, I would say that the BatchReader issue is most likely the problem in this case. There are a couple of things to tune to help with this, like maxKBps in limits.conf (to allow the forwarder to output more data at once to the indexers), and parallelPipelineCount in server.conf (to allow more threads to process data in parallel).

I would expect that in most cases Splunk would fairly quickly (minutes) read to the end of the deleted file, and then close it. At which time, the kernel would release all of the filesystem blocks allocated for the file. By stopping Splunk, you force it to close the file perhaps early, and probably caused some events to be lost. (Which may be something you don't care about)

adonio · ‎07-07-2017

hello there,
when you say "... did not release the deleted file..." do you mean you could not delete the file as it was open by another program, here the UF? can you elaborate? can you share the link to the other question?
can you elaborate on the insight you are looking for? i assume you are using [monitor://....] for the application log file, is that the case?

jrwebst · ‎07-07-2017

@adonio

Yes, we are using the [monitor:///] stanza to declare the input. And everything works great. However, after we delete the file with rm it is gone. However if you look in lsof you can still see the file is open and the resources are still being utilized. If we restart the forwarder, it then releases that file.

koshyk · ‎07-07-2017

We had similar issue, but it turned out to be issue with Antivirus scanning the same file at the time.
Please ensure AV or other programs are not locking the file

Splunk forwarder not releasing files

Community Content Calendar, October Edition

SOC4Kafka - New Kafka Connector Powered by OpenTelemetry

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

Are you a member of the Splunk Community?

Splunk forwarder not releasing files

Community Content Calendar, October Edition

SOC4Kafka - New Kafka Connector Powered by OpenTelemetry

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience