Getting Data In

High volume log rotation

jojosplunker
Explorer

Given this monitor stanza

[monitor://blah/var/log/blah.log]

What is the expected behavior during log rotation if the log rotates to log.1 before the tailing processor has reached the end of the file? Will the contents of the rolled file continue to be indexed until the pointer reaches EOF? In essence, if splunk is lagging behind the end of the file as it rotates and this stanza is only catching .log and not log.1, what can i expect?

Tags (1)
1 Solution

RicoSuave
Builder

The real question here is if a file rename affects the file descriptor splunk is using to read the file during log rotation. The answer is no.
If we have the file handle open and we are actively reading from it, we will continue until EOF unless the rotation results in the creation of a new file handle. In your case if we don't have a file handle open on that file the next time we see the file blah.log.1 it will be ignored because the stanza doesn't match it. Consider this, for example:

This behavior is actually due to the fact of how FD's will reference the file which will be linked to the inode of the file. And the inode of a file will usually only change if you are moving across filesystems.
So the scenario would really be something like this given stanza [monitor://blah/var/log/blah.log]

  • 1 -Tailing processor matches blah.log which (for the sake of this example, it has inode 777)
  • 2 -Opens the file via FD1 which is actually referring to inode 777
  • 3 -Tailing processor has not yet reached EOF, but blah.log rolls to blah.log.1 (usually implemented by the equivalent of mv blah.log to blah.log.1)
  • 4 -inode remains the same (777) for blah.log.1 which is being referenced by FD1 which was previously opened by splunk
  • 5 -new blah.log is created with inode (776) which is caught by stanza and splunk opens new file descriptor (FD2) which references inode 776
  • 6 -Tailing processor reaches EOF on blah.log.1, waits 3 seconds before closing FD 1 and will no longer monitor this file.

Why three seconds you ask? because this is controlled by the default value of time before close configured for that particular input.

time_before_close = <integer>
* Modtime delta required before Splunk can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 3.

After discussing this further with the devs it should also be noted that:
Of course step 5 could happen after step 6.
Or step 6 (close) could happen before step 3 (file rolls).

Additionally, there's no guarantee of inodes; not all filesystems have them.
However all unix filesystems (that are not insane -- I mean you can
access FAT32 on UNIX) have to offer some form of seperation of the
idea of the actual data stream from the filename, so there's always
some form of indirection, an inode being the most common
implementation.

Furthermore, keep in mind that on windows based systems there are no inodes but this is a whole other topic that would deal with MFT's.

So what is the main take away from this? If you are absolutely sure that no additional data will be appended to the file after it rolls to blah.log.1 then it is safe to configure such stanza to monitor .log only and not .log* However, if you suspect that more data will be appended to that rolled file in the future, then you should either monitor rolled files as well or increase time_before_close to a larger window.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

On UNIX, if tailing has a file open when it is rotated, Splunk will keep reading data until EOF, which means it will keep reading any additional data added to the file until it is idle for time_before_close seconds.

On Windows, if tailling has a file open when it is rotated, the file handle becomes instantly invalid, so Splunk cannot acquire any more data from the file.

However, all of this is fairly irrelevant to best practice, as you can experience unexpected downtime, or need to restart Splunk, so you can always miss data. In high volume environments it is best practice to monitor a significant number of backups so that whatever downtime you wish to handle will still keep all the data you want indexed in the monitored window. In some environments this means to monitor all older copies of the logfile. (You may tart to see scalability issues around 10k files varying by activity and I/O (stat) performance).

RicoSuave
Builder

The real question here is if a file rename affects the file descriptor splunk is using to read the file during log rotation. The answer is no.
If we have the file handle open and we are actively reading from it, we will continue until EOF unless the rotation results in the creation of a new file handle. In your case if we don't have a file handle open on that file the next time we see the file blah.log.1 it will be ignored because the stanza doesn't match it. Consider this, for example:

This behavior is actually due to the fact of how FD's will reference the file which will be linked to the inode of the file. And the inode of a file will usually only change if you are moving across filesystems.
So the scenario would really be something like this given stanza [monitor://blah/var/log/blah.log]

  • 1 -Tailing processor matches blah.log which (for the sake of this example, it has inode 777)
  • 2 -Opens the file via FD1 which is actually referring to inode 777
  • 3 -Tailing processor has not yet reached EOF, but blah.log rolls to blah.log.1 (usually implemented by the equivalent of mv blah.log to blah.log.1)
  • 4 -inode remains the same (777) for blah.log.1 which is being referenced by FD1 which was previously opened by splunk
  • 5 -new blah.log is created with inode (776) which is caught by stanza and splunk opens new file descriptor (FD2) which references inode 776
  • 6 -Tailing processor reaches EOF on blah.log.1, waits 3 seconds before closing FD 1 and will no longer monitor this file.

Why three seconds you ask? because this is controlled by the default value of time before close configured for that particular input.

time_before_close = <integer>
* Modtime delta required before Splunk can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 3.

After discussing this further with the devs it should also be noted that:
Of course step 5 could happen after step 6.
Or step 6 (close) could happen before step 3 (file rolls).

Additionally, there's no guarantee of inodes; not all filesystems have them.
However all unix filesystems (that are not insane -- I mean you can
access FAT32 on UNIX) have to offer some form of seperation of the
idea of the actual data stream from the filename, so there's always
some form of indirection, an inode being the most common
implementation.

Furthermore, keep in mind that on windows based systems there are no inodes but this is a whole other topic that would deal with MFT's.

So what is the main take away from this? If you are absolutely sure that no additional data will be appended to the file after it rolls to blah.log.1 then it is safe to configure such stanza to monitor .log only and not .log* However, if you suspect that more data will be appended to that rolled file in the future, then you should either monitor rolled files as well or increase time_before_close to a larger window.

iamjvn
Explorer

I hope this valuable piece of implementation detail is still accurate. It's exactly the level of detail we need to truly understand (and thus trust) Splunk.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

You may lose events. When you tell Splunk to monitor a single file name, it does exactly that. It won't know about the log.1 rotated copy. You could change your monitor stanza to something like:

[monitor:///blah/var/log/blah.log*]

in order to also catch any rolled log files.

RicoSuave
Builder

I like this simple answer as well.

0 Karma

jrodman
Splunk Employee
Splunk Employee

I like this simple answer.
We can't keep all files open forever. So eventually some will be closed, and if they're moidified they'll be reopened. Therefore there's a race between updates to the file and the rotation where Splunk may or may not get to the file first.

dwaddle
SplunkTrust
SplunkTrust

Not really no. The tailing processor does not necessarily keep open files handles on every file it's tailing. There is a limit to the number of file handles that the tailing processor can keep open at once, so it recycles them. When Splunk closes a file it is tailing, it makes a record of the last seek location into the file ( seekptr ), as well as a CRC of the head of the file and the tail of the file (256 bytes I think). When it swings back around to that file, it uses this information to know if it has changed and needs to be read further, or if it has been rotated.

0 Karma

jojosplunker
Explorer

This would imply that the tailing processor will just drop the open file on rename regardless of any open handles?

0 Karma

jojosplunker
Explorer

Will i loose events?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...