Splunk Search

splunkd - hardcoded pause on directory traversal?

parallaxed
Path Finder

Since the rewrite of the tailing processor in 4.1, on the whole it seems much better than previous incarnations, but it appears to induce a hardcoded delay on directory traversal.

There are consistent gaps in our debug output. We have these set:

category.TailingProcessor=DEBUG
category.WatchedFile=DEBUG
category.BatchReader=DEBUG
category.FileTracker=DEBUG 

The gaps we see are always around ~250-300ms, always when traversing into directories.

Prior versions had similar problems, but these went away somewhat with the tailing_proc_speed option.

In the worst case (0.3s), for 10000 distinct directories, this equates to ~50 minutes of idle time introduced by the tailing engine.

A few questions:

Is this really a hardcoded pause? If so, what's the reasoning?

Also, is there a way to tune / remove it?

Tags (1)
0 Karma
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

There is no hardcoded pause in the new tailing processor for 4.1. The only limit we have is that any given file or directory should only be checked for changes every 1s at most.

It may be worthwhile to correlate the log messages in splunkd with output from the strace command to see exactly what splunkd is doing during those 250-300ms, it could be checking files within a component that doesn't log every system call. Another possibility is that data is being read and put on a queue for processing.

View solution in original post

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

There is no hardcoded pause in the new tailing processor for 4.1. The only limit we have is that any given file or directory should only be checked for changes every 1s at most.

It may be worthwhile to correlate the log messages in splunkd with output from the strace command to see exactly what splunkd is doing during those 250-300ms, it could be checking files within a component that doesn't log every system call. Another possibility is that data is being read and put on a queue for processing.

View solution in original post

0 Karma

parallaxed
Path Finder

concluded it must have been another component taking over while splunkd was in iowait... we found our high iowait was due to distance from the NFS filer (>3ms)

0 Karma

amrit
Splunk Employee
Splunk Employee

...any luck yet?

0 Karma

amrit
Splunk Employee
Splunk Employee

you mentioned in another question that the files are on network storage. did you check how long readdir/getdent calls were taking in the strace output?

0 Karma

parallaxed
Path Finder

I haven't yet correlated with any pauses in strace, so that's promising. I'm guessing it's another component doing heavy lifting, but at the same time we have a lot of inputs to digest. Is there any way to give priority to the various input processors over other components?

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!