Solved: Re: How can I improve the performance of Splunk mo...

hulahoop · ‎01-29-2010

When Splunk monitors hundreds/thousands of files, there seems to be a long lag between the time the event is generated and the time Splunk indexes the event and makes it searchable. In the worst cases, this lag can be many minutes, 15 minutes or more. What can I do to increase indexing throughput in this scenario?

hulahoop · ‎01-29-2010

When installing Splunk, the default settings may not account for usage outside the norm. Monitoring many or hundreds of active files falls in this category.

There are 2 settings you can adjust in limits.conf to increase the indexing throughput when a large number of active files is involved:

[inputproc]

max_fd = <integer>
* Maximum number of file descriptors that Splunk can use in the Select Processor.
* The maximum value honored is half the current number of allowed file descriptors per process. (ulimit -n /setrlimit NOFILES)
* If a value chosen is higher than the maximum allowed value, the maximum value is used instead.
* Defaults to 32.

time_before_close = <integer>
* Modtime delta required before Splunk can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 5.

For example, these settings can increase the number of files Splunk actively monitors while reducing the rate at which Splunk recycles file descriptors:

[inputproc]
max_fd = 256
time_before_close = 2

A more in-depth discussion on Splunk’s file monitoring system follows.

In order to understand Splunk file monitoring it is useful to know:

Each Splunk instance has a single monitoring thread
One file descriptor is used to per source
File descriptors are recycled once EOF is reached
The default number of file descriptors used by Splunk is 32 (in limits.conf: max_fd = 32)
For most Unix file systems, the max fds allocated to a single program is 1024

Splunk monitors files using a sliding window. At startup, Splunk will create the configured number of file descriptors in order to save some overhead in opening and closing fds. From this pool of fds, Splunk will begin monitoring the configured data inputs. When a fd reaches EOF, the fd is returned to the pool and immediately begins monitoring the next source in the queue.

In past versions, Splunk created one thread per source. The overhead of managing the threads and context switching defeated the performance gains of monitoring files in parallel. Ultimately, Splunk is still constrained by I/O. By using a single thread, the context switching can be avoided and Splunk can better maximize the I/O throughput.

The number of file descriptors and throughput is inversely proportional. The higher the number of fds, the lower the throughput per file descriptor. Therefore, increasing max_fd beyond a certain point will invoke diminishing returns. We believe this point to be about 256.

Please Note: File monitoring improvements in Splunk 4.1 will deliver a significant performance increase. It is not clear if this tuning will be required in 4.1.

Also Note: This tuning does not affecting indexing of gzip files. If you have many gzip files, then consider uncompressing them first to take advantage of Splunk's multi-threaded file monitoring. Splunk handles gzip files sequentially.

View solution in original post

gfriedmann · ‎04-20-2011

In my case, we have about 1600 actively written files for a syslog archive. About 30GB / day to disk.

I think it may be considered a "bad" practice, but i avoid the extra disk read IO and CPU overhead by sending that data to splunk in a single TCP syslog pipe. I use a transform to extract host from the standard message itself. I use other transforms to assign sourcetypes as needed.

I think there are two main caveats with this approach.

You lose ability to auto-sourcetype by individual file source (for syslog sources)
If splunk goes down or restarts, you only have as much buffer as your syslog forwarder can handle. This may be no buffer or a queue in RAM or other auto-handled spooling conventions.

I don't mind assigning sourcetypes as needed because i got tired of the cryptic and inconsistent auto-sourcetype names for sources that had low log volumes. We still collect non-syslog files too.

And also in my environment, nobody cries if we miss a few events here or there.

I hope this answer also helps someone.

hulahoop · ‎01-29-2010

When installing Splunk, the default settings may not account for usage outside the norm. Monitoring many or hundreds of active files falls in this category.

There are 2 settings you can adjust in limits.conf to increase the indexing throughput when a large number of active files is involved:

[inputproc]

max_fd = <integer>
* Maximum number of file descriptors that Splunk can use in the Select Processor.
* The maximum value honored is half the current number of allowed file descriptors per process. (ulimit -n /setrlimit NOFILES)
* If a value chosen is higher than the maximum allowed value, the maximum value is used instead.
* Defaults to 32.

time_before_close = <integer>
* Modtime delta required before Splunk can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 5.

For example, these settings can increase the number of files Splunk actively monitors while reducing the rate at which Splunk recycles file descriptors:

[inputproc]
max_fd = 256
time_before_close = 2

A more in-depth discussion on Splunk’s file monitoring system follows.

In order to understand Splunk file monitoring it is useful to know:

Each Splunk instance has a single monitoring thread
One file descriptor is used to per source
File descriptors are recycled once EOF is reached
The default number of file descriptors used by Splunk is 32 (in limits.conf: max_fd = 32)
For most Unix file systems, the max fds allocated to a single program is 1024

Splunk monitors files using a sliding window. At startup, Splunk will create the configured number of file descriptors in order to save some overhead in opening and closing fds. From this pool of fds, Splunk will begin monitoring the configured data inputs. When a fd reaches EOF, the fd is returned to the pool and immediately begins monitoring the next source in the queue.

In past versions, Splunk created one thread per source. The overhead of managing the threads and context switching defeated the performance gains of monitoring files in parallel. Ultimately, Splunk is still constrained by I/O. By using a single thread, the context switching can be avoided and Splunk can better maximize the I/O throughput.

The number of file descriptors and throughput is inversely proportional. The higher the number of fds, the lower the throughput per file descriptor. Therefore, increasing max_fd beyond a certain point will invoke diminishing returns. We believe this point to be about 256.

Please Note: File monitoring improvements in Splunk 4.1 will deliver a significant performance increase. It is not clear if this tuning will be required in 4.1.

Also Note: This tuning does not affecting indexing of gzip files. If you have many gzip files, then consider uncompressing them first to take advantage of Splunk's multi-threaded file monitoring. Splunk handles gzip files sequentially.

saranya_fmr · ‎11-21-2016

Thankyou @hulahoop yup it did answer my query 🙂

saranya_fmr · ‎11-16-2016

Hi @hulahoop ,

1) Is this update of limits.conf done on the forwarder or Indexer?

[inputproc]
max_fd = 256
time_before_close = 2

2) If forwarder, Can this updation of limitsconf be done via a deployment-app? and will this override the value in /etc/system/default/limits.conf ?
OR
Should I update it in $SPLUNK_HOME/splunk/etc/system/local/limits.conf ??

sloshburch · ‎11-18-2016

Remember that the finer elements of tuning could be best addressed with support. They can explore the larger context of what you are trying to achieve and provide the most targeted recommendation.

hulahoop · ‎11-17-2016

Hello, the config should be applied on the instance which is collecting the data. This is usually the forwarder.

Secondly, best practice is no config should be updated or edited in the default folder. You can use Deployment Server, and propagate to the local folder or an app folder.

Does this answer your questions?

hexx · ‎05-18-2011

The "ignoreOlderThan" inputs.conf parameter introduced in 4.2 deserves a mention :

ignoreOlderThan =
- Causes the monitored input to stop checking files for updates if their modtime has passed this threshold. This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers of historical files (for example, when active log files are colocated with old files that are no longer being written to).
- A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.

See inputs.conf.spec for more.

How can I improve the performance of Splunk monitoring hundreds of active files?

Splunk Observability Synthetic Monitoring - Resolved Incident on Detector Alerts

Video | Tom’s Smartness Journey Continues

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?