I have a logging share right on the splunk server where a number of webservers write a few logs to. The structure more or less would look like:
(There are a number of "web" servers such as web-02, -03 ..etc)
/opt/var/log/httpd/web-01.domain.com/access.log /opt/var/log/httpd/web-01.domain.com/access.log.1.gz /opt/var/log/httpd/web-01.domain.com/access.log.2.gz /opt/var/log/httpd/web-01.domain.com/tmp.dmp
(There are a number of "app" servers such as app-02 , -03, ..etc)
/opt/var/log/httpd/app-01.domain.com/access.log /opt/var/log/httpd/app-01.domain.com/access.log.1.gz /opt/var/log/httpd/app-01.domain.com/access.log.2.gz /opt/var/log/httpd/app-01.domain.com/tmp.dmp
Note: the logs in each directory that I want to index are access.log , error.log, rewrite.log ..etc . Basically anything that is a current *.log file.
What I want to do is only index the "web" servers and only the uncompressed *.log files. It should ignore the app server directories.
I setup an input with the following:
Host - regex on path Hostname (regex): /opt/var/log/httpd/([^/]+)/
Advanced Options Whitelist: I've tried the following: (all variations of that without escaping /)
\/web-\d+\.domain\.com\/\w+\.log$ web-*\/*.log$ web-\d+\.*\/\w+\.log$ web-*\.log$
When I save the input and after a minute or so, the "Number of files" on the Input summary page shows 180 when it should only be indexing 24 files.
Is that number on the Input summary page accurate? Is that the actual number of files that are being indexed, even after whitelisting?
If so, what am I missing here?
Sorry - as you've noted, the traditional ways of listing monitor inputs are a bit buggy in recent versions. The REST endpoint provides a much clearer view. You can get a summarized/realtime-ish view of the endpoint via the script @ http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
I found another article which shed light on the page: http://SPLUNKSERVER:8089/services/admin/inputstatus/TailingProcessor:FileStatus which shows a good indication of what is being processed. I suppose that "number of files" field on the inputs summary page does not take into account any filtered items.
Using the pattern: .-web-\d+./*.log$ seemed to work just fine. It was just driving me nuts thinking I was indexing all these other files when I wasn't.