Splunk Search

Large number of subfolders : Splunk is calling hundreds of statx() on folders not meant to be monitored

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

4.2.3 UF on AIX

I have a folder structure like

/inputs/b/1/2/34/...
/inputs/b/1/2/3
/inputs/b/1/2/35
/inputs/b/1/2/36
/inputs/b/1/2/36/file092.log
/inputs/b/1/2/36/file123.log
/inputs/c/1/2/37/
/inputs/c/1/2/34/...
/inputs/c/1/2/3
/inputs/c/1/2/35
/inputs/c/1/2/36
/inputs/c/1/2/36/file092.log
/inputs/c/1/2/36/file123.log
/inputs/c/1/2/37/
...
/inputs/d/1/2/34/...
/inputs/d/1/2/3
/inputs/d/1/2/35
/inputs/d/1/2/36
/inputs/d/1/2/36/file092.log
/inputs/d/1/2/36/file123.log
/inputs/d/1/2/37/.../
...

Where dots are folders or subfolders, up to thousands nested.
At each level, on each folders many files.
This is on few hundreds of servers.

How do I specify a monitor:// stanza which will not cause Splunk to go and scan every folder (seeing lots statx() called and high cpu, when I use the below:

[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]
index = main
crcSalt =

So I've tried to force pure regex with

[monitor:///inpu*/[a-zA-Z0-9\/]+file[0-9]+.log]
index = main
crcSalt =

Getting

DEBUG TailingProcessor - Adding implicit whitelist '^/input[^/]*/[a-zA-Z0-9\/]+file[0-9]+.log$' on path 'monitor://'.

According to DEBUG log seems to do the job (not hitting all the unwanted folders) but my files are not picked up - What am I missing and/or is it possible to achieve the desired result on 4.2.3 AIX?

Thanks in advance

Tags (2)
1 Solution

yannK
Splunk Employee
Splunk Employee

Antonio

With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"

The only way I know to avoid it is to have a monitor for each specific path

[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]

View solution in original post

jrodman
Splunk Employee
Splunk Employee

This code is not special to AIX at all, it will be pretty much identical for all UNIX types.

For an input like

[monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

We have to set up a watch on /inputs, because everything beyond this point is a pattern match.
We do use pcre partial-match testing, so that if we reach a directory that PCRE can tell us will never match, no matter how much additional text is added, we can skip it.

Thus, for example if we find a dir such as

/inputs/q/2

we should be able to skip over this, because no matter how much additional text is added, the 2 will never match the 1 in the regex.

However, I'm a little unclear about the case of

/inputs/q/1/2/3

I think we try to force this to fail to match here by adding a slash after the directory name, but I'm not certain. We might descend into this directory. I would recommend testing locally in a simple setup.

Yann's answer to specify only the exact dirs you want observed will certainly work.

As for your attempted workaround, I think it's a little sketchy to ask Splunk to monitor /. However your regex which gets built out as ^/input[^/]*/[a-zA-Z0-9/]+file[0-9]+.log$ is very permissive. It allows any sequence of dir names that contain only ascii alphanumerics, followed a numbered filename. This regex should allow tailing to look at every single file in the hierarchy.

yannK
Splunk Employee
Splunk Employee

Antonio

With [monitor:///inputs/*/1/2/36/file[0-9]{3}.log]

The problem is that splunk will have the scan all the files/folders in" /inputs/*/"
in order to apply the whitelist/blacklist regex on "/1/2/36/file[0-9]{3}.log"

The only way I know to avoid it is to have a monitor for each specific path

[monitor:///inputs/a/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/b/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/c/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/d/1/2/36/file[0-9]{3}.log]
[monitor:///inputs/e/1/2/36/file[0-9]{3}.log]

Get Updates on the Splunk Community!

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...