When Splunk monitors hundreds/thousands of files, there seems to be a long lag between the time the event is generated and the time Splunk indexes the event and makes it searchable. In the worst cases, this lag can be many minutes, 15 minutes or more. What can I do to increase indexing throughput in this scenario?
When installing Splunk, the default settings may not account for usage outside the norm. Monitoring many or hundreds of active files falls in this category.
There are 2 settings you can adjust in limits.conf to increase the indexing throughput when a large number of active files is involved:
[inputproc] max_fd = <integer> * Maximum number of file descriptors that Splunk can use in the Select Processor. * The maximum value honored is half the current number of allowed file descriptors per process. (ulimit -n /setrlimit NOFILES) * If a value chosen is higher than the maximum allowed value, the maximum value is used instead. * Defaults to 32. time_before_close = <integer> * Modtime delta required before Splunk can close a file on EOF. * Tells the system not to close files that have been updated in past <integer> seconds. * Defaults to 5.
For example, these settings can increase the number of files Splunk actively monitors while reducing the rate at which Splunk recycles file descriptors:
[inputproc] max_fd = 256 time_before_close = 2
A more in-depth discussion on Splunk’s file monitoring system follows.
In order to understand Splunk file monitoring it is useful to know:
Splunk monitors files using a sliding window. At startup, Splunk will create the configured number of file descriptors in order to save some overhead in opening and closing fds. From this pool of fds, Splunk will begin monitoring the configured data inputs. When a fd reaches EOF, the fd is returned to the pool and immediately begins monitoring the next source in the queue.
In past versions, Splunk created one thread per source. The overhead of managing the threads and context switching defeated the performance gains of monitoring files in parallel. Ultimately, Splunk is still constrained by I/O. By using a single thread, the context switching can be avoided and Splunk can better maximize the I/O throughput.
The number of file descriptors and throughput is inversely proportional. The higher the number of fds, the lower the throughput per file descriptor. Therefore, increasing max_fd beyond a certain point will invoke diminishing returns. We believe this point to be about 256.
Please Note: File monitoring improvements in Splunk 4.1 will deliver a significant performance increase. It is not clear if this tuning will be required in 4.1.
Also Note: This tuning does not affecting indexing of gzip files. If you have many gzip files, then consider uncompressing them first to take advantage of Splunk's multi-threaded file monitoring. Splunk handles gzip files sequentially.
In my case, we have about 1600 actively written files for a syslog archive. About 30GB / day to disk.
I think it may be considered a "bad" practice, but i avoid the extra disk read IO and CPU overhead by sending that data to splunk in a single TCP syslog pipe. I use a transform to extract host from the standard message itself. I use other transforms to assign sourcetypes as needed.
I think there are two main caveats with this approach.
I don't mind assigning sourcetypes as needed because i got tired of the cryptic and inconsistent auto-sourcetype names for sources that had low log volumes. We still collect non-syslog files too.
And also in my environment, nobody cries if we miss a few events here or there.
I hope this answer also helps someone.
When installing Splunk, the default settings may not account for usage outside the norm. Monitoring many or hundreds of active files falls in this category.
There are 2 settings you can adjust in limits.conf to increase the indexing throughput when a large number of active files is involved:
[inputproc] max_fd = <integer> * Maximum number of file descriptors that Splunk can use in the Select Processor. * The maximum value honored is half the current number of allowed file descriptors per process. (ulimit -n /setrlimit NOFILES) * If a value chosen is higher than the maximum allowed value, the maximum value is used instead. * Defaults to 32. time_before_close = <integer> * Modtime delta required before Splunk can close a file on EOF. * Tells the system not to close files that have been updated in past <integer> seconds. * Defaults to 5.
For example, these settings can increase the number of files Splunk actively monitors while reducing the rate at which Splunk recycles file descriptors:
[inputproc] max_fd = 256 time_before_close = 2
A more in-depth discussion on Splunk’s file monitoring system follows.
In order to understand Splunk file monitoring it is useful to know:
Splunk monitors files using a sliding window. At startup, Splunk will create the configured number of file descriptors in order to save some overhead in opening and closing fds. From this pool of fds, Splunk will begin monitoring the configured data inputs. When a fd reaches EOF, the fd is returned to the pool and immediately begins monitoring the next source in the queue.
In past versions, Splunk created one thread per source. The overhead of managing the threads and context switching defeated the performance gains of monitoring files in parallel. Ultimately, Splunk is still constrained by I/O. By using a single thread, the context switching can be avoided and Splunk can better maximize the I/O throughput.
The number of file descriptors and throughput is inversely proportional. The higher the number of fds, the lower the throughput per file descriptor. Therefore, increasing max_fd beyond a certain point will invoke diminishing returns. We believe this point to be about 256.
Please Note: File monitoring improvements in Splunk 4.1 will deliver a significant performance increase. It is not clear if this tuning will be required in 4.1.
Also Note: This tuning does not affecting indexing of gzip files. If you have many gzip files, then consider uncompressing them first to take advantage of Splunk's multi-threaded file monitoring. Splunk handles gzip files sequentially.
Thankyou @hulahoop yup it did answer my query 🙂
Hi @hulahoop ,
1) Is this update of limits.conf done on the forwarder or Indexer?
[inputproc]
max_fd = 256
time_before_close = 2
2) If forwarder, Can this updation of limitsconf be done via a deployment-app? and will this override the value in /etc/system/default/limits.conf
?
OR
Should I update it in $SPLUNK_HOME/splunk/etc/system/local/limits.conf
??
Remember that the finer elements of tuning could be best addressed with support. They can explore the larger context of what you are trying to achieve and provide the most targeted recommendation.
Hello, the config should be applied on the instance which is collecting the data. This is usually the forwarder.
Secondly, best practice is no config should be updated or edited in the default folder. You can use Deployment Server, and propagate to the local folder or an app folder.
Does this answer your questions?
The "ignoreOlderThan" inputs.conf parameter introduced in 4.2 deserves a mention :
ignoreOlderThan =
See inputs.conf.spec for more.