Getting Data In

How to force splunk to index new files quickly?

fcastano
Engager

How do I force splunk to index new files in the directory that is being monitored immediately? sometimes it takes really long for it to detect/index new files.

TIA,

fdo

Tags (1)

gkanapathy
Splunk Employee
Splunk Employee

How many files in the directory (and below), how many are "actively" written, and what version of Splunk is the forwarder (assuming there is a forwarder, otherwise the component that reads the files)

0 Karma

hulahoop
Splunk Employee
Splunk Employee

How many files are you monitoring in that directory? Is it a sub directory of a parent directory being monitored?

If there are lots (hundreds/thousands) of files in the directory, then Splunk has to cycle through all of them to detect change. If it makes sense, applying a whitelist for more targeted monitoring may help with speeding change detection. Or configure a single input per file/source in the directory instead of monitoring the entire directory.

Additionally, a more drastic approach is to consider increasing the number of file descriptors Splunk uses for monitoring inputs. The default is 32 FDs, which means Splunk uses a sliding window of 32 files to check for change at any given time. Try increasing (doubling or tripling) this to see if it helps. But you should try the other options first.

This parameter is controlled in limits.conf:

[inputproc]
max_fd = <integer>
* Maximum number of file descriptors that Splunk can use in the Select Processor.
* The maximum value honored is half the current number of allowed file descriptors per process. (ulimit -n /setrlimit NOFILES)
* If a value chosen is higher than the maximum allowed value, the maximum value is used instead.
* Defaults to 32.

gkanapathy
Splunk Employee
Splunk Employee

The 'cycling' behavior has changed a lot in version 4.1+ (from 4.0.x and down). From there, files that have not been updated recently are checked less and less frequently. Thus you can have many static or old rolled copies in the directory with minimal performance impact. Of course, this could work against you in perverse circumstances, but in practice is fine.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...