Getting Data In

The limitation of monitor files in one directory ?

Path Finder

I want to know if there is a limitation of monitor files in one directory ?

I had monitored a directory which will create a lot of .gz files everyday, and now , it reach 35,000 files and the forwarder stop monitoring new files in this directory !!!

The forwarder is still running and works fine for other files monitoring, How do I handle this situation ?

Tags (2)
0 Karma

Esteemed Legend

The problem is that you are processing gz files. You must stop using Splunk to uncompress the files. You can either do it yourself with your own multi-threaded pre-processor or, assuming there is plenty of CPU left on the Forwarder (which will certainly be the case if this is a data dumping-ground waystation which exists only to be a Splunk Forwarder), stand up multiple concurrent Splunk Forwarder instances, each of which will handling its own partitioned segment of the workload.

THE EXPLANATION: The Splunk Archive Queue (AKA aq and aeq) is single-threaded (yes, really) which means that if you are forwarding compressed files and these files are large or frequent (high data bandwidth), you will almost certainly overload the AQ and block on the Forwarder. If the number of incoming (or backlog of) files is as large as you indicate, a single instance of Splunk will never be able to dig out. If you check your logs, you will find many errors like this in your metrics.log:

 08-08-2014 20:01:03.681 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=63, smallest_size=0
0 Karma

Path Finder

I have a similar situation, where my files are not compressed and I still notice Splunk ignore files from other monitored directories.

0 Karma

Esteemed Legend

Splunk will ignore files that have the same checksum based on the first and last bytes of the file so perhaps it is deliberately (appropriately) skipping files that look identical to files that have already been processed.

0 Karma

Legend

The forwarder is trying to continue to monitor all 35,000 files, because it can't know which of the files might be updated next. Since the older logs will never be updated, this is a huge waste, but Splunk doesn't know that. Splunk just keeps cycling through all of the files, checking to see if files have been updated.

My personal experience says that it will be helpful to remove files from the directory after they are indexed. A simple archiving script that runs daily to remove files over a few days old would work nicely. If you can get the number of files in the directory down to a few thousand, the overall memory footprint of the forwarder will drop - and the performance will jump dramatically.

You also may have a second problem: Splunk has to unzip the .gz files before it indexes them. This is a sequential process; Splunk will not unzip a bunch of files at once (this is to avoid disk space problems). If you reduce the number of files in the directory, but you still have a performance problem, this might be it. Is there a way that you might add the files to the directory unzipped, and then have your archiving script zip the files as they are moved?

Finally, you may want to increase the file descriptor limits in limits.conf - but increasing the FDs also increases memory use, so don't go crazy with it.

[inputproc]
max_fd = 10000

For more information about understanding how Splunk does file monitoring, take a look at the Splunk wiki article
Troubleshooting Monitor Inputs.

Path Finder

As an alternative to archiving old files in the directory, would the ignoreolderthan option work? In this way, I can still have all the 35000 files and not worry about Splunk cycling through older files.
Or would the check for the ignoreolderthan option still result in Splunk monitoring every file and then ignoring those that are older than the set limit?

0 Karma

Legend

Excellent point, Kristian.

0 Karma

Ultra Champion

You'd need to set your ulimit accordingly, right? With some extra room for other things. Otherwise Splunk could possibly try to use more resources than are available, resulting in a crash?