Hi,
Splunk can monitor a file like a tail -f command.
I would like to know how actually Splunk sees the file change,
how often Splunk see if the file has changed,
does Splunk actually open a monitored file to check if the file is updated...etc
I have read the doc about crc of first/last 256bytes and crcSalt contents,
and also looked at Wiki page.
However, I could not find the information I want to know.
I would like to know tailing mechanism so I can understand how well Splunk can handle log file change for very heavy traffic logging.
I hope anyone could answer my question.
Thank you in advance
The process responsible for monitoring files is the TailingProcessor component of Splunkd. The question you're asking isn't an easy one to answer because there are a large number of variables to take into consideration, such as the number of files being monitored, the aggregate size of those files, the configuration of the input stanza, etc.
If you were monitoring say, under 50k in files and they were 'normal' style syslog files, you might have a 5 second delay because it should have available threads to be able to read the new data. TailingProcessor has a number of threads responsible for reading files, and it'll continue to use those threads until they're all used, at which point you'll have to wait for TailingProcessor to free one before it can do new work. If you've got 500k in files and they are 1GB each in size, rolling every hour, it could be significantly longer and you might notice that you're running behind in processing data.
TailingProcessor doesn't actually check to see if the file has been updated, rather, it is told the file has been updated and it needs to read the new data based on the checksum of the file changing, as you alluded to you in your post.
This is usually the question my team asks, and this is not documented in Doc or WiKi as long as I know. As a Splunk user, I am curous about this type of things.
The process responsible for monitoring files is the TailingProcessor component of Splunkd. The question you're asking isn't an easy one to answer because there are a large number of variables to take into consideration, such as the number of files being monitored, the aggregate size of those files, the configuration of the input stanza, etc.
If you were monitoring say, under 50k in files and they were 'normal' style syslog files, you might have a 5 second delay because it should have available threads to be able to read the new data. TailingProcessor has a number of threads responsible for reading files, and it'll continue to use those threads until they're all used, at which point you'll have to wait for TailingProcessor to free one before it can do new work. If you've got 500k in files and they are 1GB each in size, rolling every hour, it could be significantly longer and you might notice that you're running behind in processing data.
TailingProcessor doesn't actually check to see if the file has been updated, rather, it is told the file has been updated and it needs to read the new data based on the checksum of the file changing, as you alluded to you in your post.
Is there an underlying problem you're trying to solve or a specific concern about your environment that you could elaborate on?