I've installed Splunk build 4.1.2 and this upgrade has introduced the following in the splunkd.log:
05-24-2010 12:58:34.695 INFO TailingProcessor - File descriptor cache is full, trimming...
05-24-2010 12:58:37.695 INFO TailingProcessor - File descriptor cache is full, trimming...
05-24-2010 12:58:40.956 INFO TailingProcessor - File descriptor cache is full, trimming...
05-24-2010 12:58:43.753 INFO TailingProcessor - File descriptor cache is full, trimming...
05-24-2010 12:58:46.465 INFO TailingProcessor - File descriptor cache is full, trimming...
05-24-2010 12:58:48.832 INFO TailingProcessor - File descriptor cache is full, trimming...
05-24-2010 12:58:51.807 INFO TailingProcessor - File descriptor cache is full, trimming...
Now that we are using the "time_before_close" parameter within most of the input stanza's we are leaving files open for longer. This I imagine is putting more pressure on Splunk to be able to open and close files as required.
As you can see from the log above, whilst not an error it is an indicator that we need to tune our Splunk instance better to cope with the large number of files that Splunk has to monitor (I would imagine around 800 per day)
can you please look into this and advise me on how to better tune our Splunk instance?
I have also tested setting various different OS values for the number of allowed open file descriptor (ulimit -n) and the Splunk max_fd (within limits.conf) but they have not resolved the issue.
So 4.1 has introduced the new tailing processor, which is now capable of handling many more files than the old version - wahoo!
The above info message just means that Splunk has more files to read but it has already reached the limit of open FD's. 3.x versions of Splunk used the max_fd
setting in limits.conf
so you could tune Splunk to increase this overall number, but that has not yet been added to the 4.1.x code-branch. It's coming though, and will be in the official 4.1.3 release, which is next on the list.
For now, if you're monitoring directories that contain 800 files, it's not going to be much of an issue and its just a case of Splunk cleaning up open FD's so it can move on to the next file. It will always come back and read your files again, so it shouldn't be missing out on any data.
So 4.1 has introduced the new tailing processor, which is now capable of handling many more files than the old version - wahoo!
The above info message just means that Splunk has more files to read but it has already reached the limit of open FD's. 3.x versions of Splunk used the max_fd
setting in limits.conf
so you could tune Splunk to increase this overall number, but that has not yet been added to the 4.1.x code-branch. It's coming though, and will be in the official 4.1.3 release, which is next on the list.
For now, if you're monitoring directories that contain 800 files, it's not going to be much of an issue and its just a case of Splunk cleaning up open FD's so it can move on to the next file. It will always come back and read your files again, so it shouldn't be missing out on any data.
Question:
I have one thought, that, how do we come accross setting the values for max_fd= ?
What are the rationals setting values or it's random, what feels good for us?
Some people say, increase or double in sequential way, using default value, till it resolves.
Had the same events in splunkd.log, Micks solution works (just make sure you set a limit that is lower than the splunk users ulimit)
[inputproc]
max_fd = 256
Syntax is really:
[inputproc]
max_fd = 256
Added to limits.conf.