Solved: Practical limit for monitor inputs? 20000+ directo...

parallaxed · ‎06-14-2010

We have a configuration that's been idling for over two days, and instead of processing locations that the tailing processor has acknowledged, it continues to loop over previously processed locations, and the internal logs.

Does this mean there's a practical/hard limit on the number of directories that can be absorbed? It seems other monitor inputs are being neglected somewhat. The tailing processor acknowledged these directories 2 days ago, but had not yet processed down to the bottommost level (the files themselves).

Are there any good commands to inspect what the tailing processor is up to? What's on the queues etc?

jrodman · ‎06-14-2010

The 4.1.x implementation of file input leans heavily on stat() to get information about which files should be opened and read. For network inputs, it's quite possible to simply have total latency for a very large number of stat()s become unreasonably large.

I understand you're seeing the nfs system is not overloaded, but what sort of latency are we seeing? I'm not familiar with troubleshooting with nfsstat. I would think to look at the i/o picture with iostat and/or using strace -p to look at the system call time of the stat()s, and then extrapolate.

What is your total number of files in the monitored hierarchies? Essentially, a number something like this:

find /the /directories /you /monitor |wc -l

The specifics of the monitor lines might be useful, as well as any information about subdirectories of these you might not actually be interested in.

View solution in original post

parallaxed · ‎09-17-2010

The key to the final resolution is that our use case involves a lot of small files, and we have a notable latency (~5ms) between filer and indexer.

Splunk uses stat() and access() a fair bit during it's various uptake cycles. With lots of small files (as opposed to a few big ones), Splunk is spending expensive, uncached iops to stat() the files as it traverses the inputs.

Had the situation been reverse (a few big files), readahead cache would've kicked in, and the effect of the latency would've been negligible.

To mitigate this a little, we added forwarders closer to the source (<1ms), to take advantage of less RTT on the noncached iops. Curiously, we've observed NFS caching being drastically less effective on access() calls at higher latencies, but we're still investigating some of these interesting side-effects.

parallaxed · ‎09-20-2010

If we could, we would, unfortunately we can't aggregate all this data in one place, and that's part of the problem that we're using Splunk to try and solve. On the other note about caching, yes these operations are cacheable (even in v3), but short lived as I understand it, for the reasons you mentioned. Certain things like attribute cache are possible to disable for security. If the read pattern isn't able to benefit from caching much, perhaps there may be some worth in determining which ops are faster without caching effects, and conservatively using those when network filesystems are in use

jrodman · ‎09-17-2010

My suggestion, if you want to monitor a very large number of files is to do so locally to the files, where the performance can be much much higher.

jrodman · ‎09-17-2010

Maybe NFSv4 has gotten some advanced black magic or something, but you usually can't really cache stat calls, because the files are changing, so the answers are always changing. The goal for splunkd is to keep up-to-date on every file in the monitor window, and NFS doesn't have any features like dnotify/inotify/fam/gamin whatever to acquire this information in a more aggreate or server-generated fashion. This I/O pattern is simply not one that's commonly followed by most apps and so there isn't an optimized set of system calls for it.

Mick · ‎06-30-2010

There is a bug in the 4.1.3 and previous builds that is triggered by the slow NFS stat() calls, that much we know for sure. If your Splunk instance just stops indexing data, or stops picking up new files, then it's highly likely you are hitting this bug. Splunk should never stop indexing data from monitored files and discovering new files as long as new data is available.

Increasing the max_fd setting will increase the number of FD's we can keep open for files that we've finished reading. The files will remain open until the time_before_close seconds. This doesn't mean that Splunk will run faster if you increase it to a huge number, it just means that files stay open longer so we may pick up new data more quickly. If files aren't updated once read, then it's of little use to you.

Theoretically, there should be no limit to the number of files that Splunk can monitor. We depend on the OS to tell us which files have changed, so if the OS knows then Splunk will know too. An important distinction here is a 'live' file vs a 'static' file. Live files are currently being updated/written to, and Splunk will pick up new data from here as long as it keeps being added. Static files should be indexed and then ignored indefinitely.

The concept of 'real-time' can also play a major role here. How quickly do you want Splunk to pick up data once it's written to a file? If your requirement is that Splunk should display the data as quickly as possible, then you want to limit the number of live files that Splunk is monitoring. If you have 20,000 live files then Splunk will not be able to keep up to date with every single file at the same time. However, if you have 20,000 files and only 50 of them are live, then Splunk should be easily able to keep up - are we starting to make sense yet?

Our testing has shown that when tailing 10,000 live files, you can expect somewhere between 30 seconds and 1 minute lag. Those timings should increase linearly as you increase the number of files, so at 20,000 files you can expect 1 - 2 minutes delay.

There's no hard and fast rule here, the speed of Splunk is dependent on the hardware resources available and how the instance is tuned - number of CPU cores, number of indexing threads, number of FD's, speed of the disk Splunk is writing to, data-segmentation etc. The faster Splunk can write to the index, the faster it can pull new data from files. Testing is the only true way to gauge the expected performance of your hardware, with your data.

rotten · ‎07-06-2010

... At least that is what I've observed so far. If you increase max_fd too far, a Splunk "lightweight" forwarder starts consuming huge amounts of system resources, and still seems to get "jammed up" even on reasonably fast file systems. (While jammed it still consumes lots of resources.) Without changing the inputs, but throttling back on max_fd, it seems you get a more stable forwarder which takes up a smaller footprint. ... At the expense of not being able to process as many files simultaneously...

rotten · ‎07-06-2010

If splunk keeps a file open for the default of 5 seconds, and it can only keep open, the default of 64 (?) files at a time, if you are parsing 10,000 files then it will take 156 * 5 = 781 seconds, or about 13 minutes to just look at each file once. If any of those files change within that 5 second window, it will take even longer. If you have 10's of thousands of files you might not get back to look at something again for days unless you tune max_fd up. If you have 64 files changing every 5 seconds, you'll never read in the 65th file.

rotten · ‎06-29-2010

I have had a similar problem (4.1.1, 4.1.2 and 4.1.3), although I wasn't on a particularly slow disk. I had tuned the max_fd for the lightweight forwarder up as high as the system would let me in order to pick up as many files as possible. (On that server, which has a 32,000 max, I could get the forwarder to 16,000.)

The forwarder would run for anywhere from 15 minutes to several hours before it would stop indexing anything. Sometimes it would continue to index one active file, but as soon as that file was quiet for longer than the time_before_close value, it would stop indexing that too.

The forwarder would continue to consume a lot of cpu and memory, it just didn't seem to be doing anything.

I throttled the max_fd back to 1024 last night, and now it seems to be keeping up just fine. Last week I had cut back the number of files it was traversing, so I didn't really need the max_fd = 16000 but that didn't seem to help the stability or latency.

I suspect there is a practical limit to the number of threads a forwarder can juggle internally. It is somewhere between 1024 and 16000 (at least on Solaris 10).

I have some forwarders with max_fd = 8192 and they seem to be running ok. (I need to look at them more closely now that I have something to study.) The instability threshold may actually be between 8192 and 16000. If I had a lab, and some time I could probably pin down the threshold more precisely.

My experience thus far is if you need to scan more than 8,000 files, you definitely need another forwarder (on the same system) - regardless of how fast your disk is. In fact I'd be inclined to recommend a forwarder for every 2,000 - 3,000 files. There was another thread on this here: http://answers.splunk.com/questions/3727/performance-of-forwarder-in-high-volume-environment/3742#37... . I think we are stuck figuring out a rule of them by trial and error and experience.

rotten · ‎06-29-2010

The forwarders with max_fd = 8192 appear to be doing ok, however the most files any one of them has open is only 6775. The rest are between 2-5K files. That means I'd put a stable upper bound for what I've been able to test so far at 6775.

amrit · ‎06-14-2010

We've seen a number of customer issues recently with large numbers of files on relatively slow storage (NFS, UNC, anything non-local). This appears to have been caused by a subtle bug in the 4.1.x line of monitor:// code, in which large numbers of slow stat() calls end up starving out readdir() calls - meaning new files don't get picked up.

The next scheduled maintenance release will resolve this issue

jrodman · ‎06-14-2010

The 4.1.x implementation of file input leans heavily on stat() to get information about which files should be opened and read. For network inputs, it's quite possible to simply have total latency for a very large number of stat()s become unreasonably large.

I understand you're seeing the nfs system is not overloaded, but what sort of latency are we seeing? I'm not familiar with troubleshooting with nfsstat. I would think to look at the i/o picture with iostat and/or using strace -p to look at the system call time of the stat()s, and then extrapolate.

What is your total number of files in the monitored hierarchies? Essentially, a number something like this:

find /the /directories /you /monitor |wc -l

The specifics of the monitor lines might be useful, as well as any information about subdirectories of these you might not actually be interested in.

parallaxed · ‎09-17-2010

Do you think it leans too heavily on stat(), access() and friends? In our traces we saw mixtures of both. Is every call justified? You're not wrong about the latency, it turned out to be a major part of the problem - I've given a quick overview above...

parallaxed · ‎06-14-2010

Apologies for the lack of info:

4.1.2, Linux x64, 16GB, 4-way dual-core Xeon (2.33GHz).
Files are being read off fast NetApp filers (nfsstat shows no bottlenecks)

gkanapathy · ‎06-14-2010

what version of Splunk are you running (where you are collecting the inputs/where the monitor is configured)?

Practical limit for monitor inputs? 20000+ directories?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Practical limit for monitor inputs? 20000+ directories?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers