Getting Data In

Is it better to have many specific monitors, or one monitor with wildcards and blacklists?

New Member

We are generating inputs.conf configs programmatically with puppet or chef.

We have a directory tree with a bunch of files we want monitored, and a bunch we don’t.

/var/
 poseur/
     apps/
      goodapp_1/
         foo.log
        bar.log
    goodapp_2/
       foo.log
       bar.log
   badapp_1
     foo.log
     bar.log
  badapp_1
     foo.log
     bar.log

More details:
in this example:

  • we want to store foo.log from the good apps, but not bar.log
  • we cannot store either of logs from bad apps under any circumstances

In real life:

  • there are actually more than 2 log files in all apps
  • there are actually many more than 2 good apps, named in no pattern
  • there are actually many more than 2 bad apps, named in no pattern
  • the names of the files in good apps are the same as the ones in bad apps

Our options as I see them:

Option A: many specific monitors for good apps

[monitor:///var/poseur/apps/goodapp1/foo.log]
[monitor:///var/poseur/apps/goodapp
2/foo.log]

QUESTION 1a: if there are 40 of these monitors on a given machine, will it be very slow? 80? 100?

Option B: one monitor with a wild card, and a black list

[monitor:///var/poseur/apps/*/foo.log]
blacklist = (badapp1|badapp2)

QUESTION 1b: with a long (10-20 elements) blacklist regex, and the wildcard describing 40+ directories, is this going to be very slow?

0 Karma

Splunk Employee
Splunk Employee

It is hard to tell, this should be similar. But I have no tests and measures to validate that.

If you expect to have an ever growing list of folders, I think that the whitelist/blacklist approach is easier to maintain.

0 Karma