Knowledge Management

Tailing Mechanism

melonman
Motivator

Hi,

Splunk can monitor a file like a tail -f command.
I would like to know how actually Splunk sees the file change,
how often Splunk see if the file has changed,
does Splunk actually open a monitored file to check if the file is updated...etc

I have read the doc about crc of first/last 256bytes and crcSalt contents,
and also looked at Wiki page.
However, I could not find the information I want to know.
I would like to know tailing mechanism so I can understand how well Splunk can handle log file change for very heavy traffic logging.

I hope anyone could answer my question.

Thank you in advance

Tags (1)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

The process responsible for monitoring files is the TailingProcessor component of Splunkd. The question you're asking isn't an easy one to answer because there are a large number of variables to take into consideration, such as the number of files being monitored, the aggregate size of those files, the configuration of the input stanza, etc.

If you were monitoring say, under 50k in files and they were 'normal' style syslog files, you might have a 5 second delay because it should have available threads to be able to read the new data. TailingProcessor has a number of threads responsible for reading files, and it'll continue to use those threads until they're all used, at which point you'll have to wait for TailingProcessor to free one before it can do new work. If you've got 500k in files and they are 1GB each in size, rolling every hour, it could be significantly longer and you might notice that you're running behind in processing data.

TailingProcessor doesn't actually check to see if the file has been updated, rather, it is told the file has been updated and it needs to read the new data based on the checksum of the file changing, as you alluded to you in your post.

View solution in original post

0 Karma

melonman
Motivator

This is usually the question my team asks, and this is not documented in Doc or WiKi as long as I know. As a Splunk user, I am curous about this type of things.

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

The process responsible for monitoring files is the TailingProcessor component of Splunkd. The question you're asking isn't an easy one to answer because there are a large number of variables to take into consideration, such as the number of files being monitored, the aggregate size of those files, the configuration of the input stanza, etc.

If you were monitoring say, under 50k in files and they were 'normal' style syslog files, you might have a 5 second delay because it should have available threads to be able to read the new data. TailingProcessor has a number of threads responsible for reading files, and it'll continue to use those threads until they're all used, at which point you'll have to wait for TailingProcessor to free one before it can do new work. If you've got 500k in files and they are 1GB each in size, rolling every hour, it could be significantly longer and you might notice that you're running behind in processing data.

TailingProcessor doesn't actually check to see if the file has been updated, rather, it is told the file has been updated and it needs to read the new data based on the checksum of the file changing, as you alluded to you in your post.

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

Is there an underlying problem you're trying to solve or a specific concern about your environment that you could elaborate on?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...