Getting Data In

On windows, I sometimes get an error during log rotation if splunk is monitoring that file; what's that about?

New Member

I do a rotate sql log frequently. Splunk is monitoring my log files
There are sometime error on the SQL log file rotation process

Is it really the Splunk process that cause the issue?

0 Karma

Splunk Employee
Splunk Employee

It's hard to be certain based on the description provided, but this is PROBABLY a problem with creating a new logfile.

In Windows, if your program is writing out, say sql_logfile.log, and splunk is reading sql_logfile.log, a problem can arise during rotation, that goes like this:

  • Usually the writing app is told to close the logfile first, and does.
  • Some program, often the writing app, sometimes a log rotator renames the file to sql_logfile.log.1
  • * This works, despite the fact that Splunk has the file open, because we open the file SHARE_DELETE, permitting the file to be "deleted" out from under us. Windows considers this type of rename scenario to be a delete for file open purposes.
  • Splunk has not yet tried to read from this file again. If it did, it would close the file. That will happen soon.
  • The writing program tries to create a new sql_logfile.log
  • * This fails, because windows does not permit a new logfile to be created with the same name as the deleted logfile until all programs have closed the deleted file.

There is no solution to this, short of rewriting Splunk data collection as a system call hook oriented tool, or a kernel driver. Both of which would greatly increase the risk of causing harm to the system while collecting information.

Generally speaking, the file semantics on Windows do not support rotating logs in this fashion. Renaming live files is problematic in a number of different ways. Thus most Windows-native apps will follow a strategy of creating files named for the time or date, and not renaming them. If that's an option for you I recommend it.

If it's not an option, you could try using the setting time_before_close = 0 on an input pointing at these files. That does not eliminate this problem, but should drastically shrink the time window during which splunk has the files open, making it occur less frequently.

Contributor

@jrodman - Do you know if this is still relevant ~7 years later? We're currently trying to monitor a third-party system that rolls logs like this, and we seem to be running into the same issue. Is the workaround still the same?

0 Karma