Getting Data In

Why is my wildcard descending into directories?

pheezy
Explorer

According to this document: Specifyinputpathswithwildcards

The asterisk wildcard matches anything
in that specific directory path
segment.

Unlike "...", "*" doesn't recurse
through any subdirectories.

However, this doesn't seem to be case.

For instance, I have many inputs like this:

[monitor:///usr/local/vnd/*/server/logs/stdout.log]
disabled=false
sourcetype=log4j
blacklist=data

I would think that this would only look at the first level of directories but the output of /opt/splunk/bin/splunk list monitor shows that there are thousands upon thousands of monitored directories. This seems to cause forwarding agents to use up to 1GB of memory. Am I doing something wrong? How do I limit the directory depth when using a wildcard?

Monitored Directories:
    $SPLUNK_HOME/etc/apps/sample_app/logs
            /opt/splunk/etc/apps/sample_app/logs/maillog
            /opt/splunk/etc/apps/sample_app/logs/maillog.1
    $SPLUNK_HOME/var/log/splunk
...
    /usr/local/vnd/*/server/logs/stdout.log
            /usr/local/vnd/application1
            /usr/local/vnd/application1/java
            /usr/local/vnd/application1/java/bin
            /usr/local/vnd/application1/java/db
            /usr/local/vnd/application1/java/demo
            /usr/local/vnd/application1/java/include
            /usr/local/vnd/application1/java/jre
            /usr/local/vnd/application1/java/lib
            /usr/local/vnd/application1/java/man
            /usr/local/vnd/application1/java/sample
            /usr/local/vnd/application1/logs
            /usr/local/vnd/application1/resin
            /usr/local/vnd/application1/resin/automake
            /usr/local/vnd/application1/resin/bin
            /usr/local/vnd/application1/resin/conf
            /usr/local/vnd/application1/resin/contrib
            /usr/local/vnd/application1/resin/lib
            /usr/local/vnd/application1/resin/modules
            /usr/local/vnd/application1/resin/php
            /usr/local/vnd/application1/resin/webapps
            /usr/local/vnd/application1/resin/win32
            /usr/local/vnd/application1/server
...
Tags (2)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

This is a result of how wildcarding of monitored directories is implemented in Splunk. Splunk will descend to the directory of the longest non-wildcarded path from root, then enumerate all files below that, and filter out those that do not match the wildcard. In your case for example, the files in /usr/local/vnd/application1/java will be enumerated because they are under /usr/local/vnd/ (because that is the longest path component without a wildcard), but they will excluded from being read because they won't match the full wildcard.

The result is that it should still only get the correct files, but it will be slower and use more resources than you'd expect to do so.

While there actually is a reason this is implemented this way (to do with allowing full PCRE regex on wildcard paths), this method for handling wildcard does indeed suck in cases like yours, and I encourage you to file a bug/ER with Splunk.


Update:

There are a couple of ways to try to work around this:

  • If you know the names of the individual subdirectories represented by * specifically, or can reasonably enumerate all the possible ones, create stanzas for each one. This deals with the problem by removing wildcards completely
  • If you don't know them ahead of time, instead periodically runs a separate script that looks and creates symbolic links to the directories you want. Put these links in another dedicated location, and monitor that other location. This deals with the problem by removing the non-matching directories from Splunk's path so it doesn't see them. Instead, your script does that work.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

This is a result of how wildcarding of monitored directories is implemented in Splunk. Splunk will descend to the directory of the longest non-wildcarded path from root, then enumerate all files below that, and filter out those that do not match the wildcard. In your case for example, the files in /usr/local/vnd/application1/java will be enumerated because they are under /usr/local/vnd/ (because that is the longest path component without a wildcard), but they will excluded from being read because they won't match the full wildcard.

The result is that it should still only get the correct files, but it will be slower and use more resources than you'd expect to do so.

While there actually is a reason this is implemented this way (to do with allowing full PCRE regex on wildcard paths), this method for handling wildcard does indeed suck in cases like yours, and I encourage you to file a bug/ER with Splunk.


Update:

There are a couple of ways to try to work around this:

  • If you know the names of the individual subdirectories represented by * specifically, or can reasonably enumerate all the possible ones, create stanzas for each one. This deals with the problem by removing wildcards completely
  • If you don't know them ahead of time, instead periodically runs a separate script that looks and creates symbolic links to the directories you want. Put these links in another dedicated location, and monitor that other location. This deals with the problem by removing the non-matching directories from Splunk's path so it doesn't see them. Instead, your script does that work.

gkanapathy
Splunk Employee
Splunk Employee

It would run exactly the same, and do the exact same thing. Updating the answer with more suggestions.

0 Karma

pheezy
Explorer

That's odd, because I actually don't get the correct files. I'm deploying apps that actually have a lot of monitor inputs like the one listed in the OP, so maybe there is some kind of overlap? Can I use full regex support then? Would something like this work?
[monitor:///usr/local/vnd/[^\]+/server/logs/stdout.log]

I would think that would be faster and use less resources as well, no? If, of course, it's possible.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...