Splunk Search

Monitor a directory and using whitelisting

Explorer

I have a logging share right on the splunk server where a number of webservers write a few logs to. The structure more or less would look like:

(There are a number of "web" servers such as web-02, -03 ..etc)

/opt/var/log/httpd/web-01.domain.com/access.log
/opt/var/log/httpd/web-01.domain.com/access.log.1.gz
/opt/var/log/httpd/web-01.domain.com/access.log.2.gz
/opt/var/log/httpd/web-01.domain.com/tmp.dmp

(There are a number of "app" servers such as app-02 , -03, ..etc)

/opt/var/log/httpd/app-01.domain.com/access.log
/opt/var/log/httpd/app-01.domain.com/access.log.1.gz
/opt/var/log/httpd/app-01.domain.com/access.log.2.gz
/opt/var/log/httpd/app-01.domain.com/tmp.dmp

Note: the logs in each directory that I want to index are access.log , error.log, rewrite.log ..etc . Basically anything that is a current *.log file.

What I want to do is only index the "web" servers and only the uncompressed *.log files. It should ignore the app server directories.

I setup an input with the following:

Host - regex on path Hostname (regex): /opt/var/log/httpd/([^/]+)/

Advanced Options Whitelist: I've tried the following: (all variations of that without escaping /)

\/web-\d+\.domain\.com\/\w+\.log$

web-*\/*.log$

web-\d+\.*\/\w+\.log$

web-*\.log$

When I save the input and after a minute or so, the "Number of files" on the Input summary page shows 180 when it should only be indexing 24 files.

Is that number on the Input summary page accurate? Is that the actual number of files that are being indexed, even after whitelisting?

If so, what am I missing here?

Tags (3)
0 Karma

Splunk Employee
Splunk Employee

Sorry - as you've noted, the traditional ways of listing monitor inputs are a bit buggy in recent versions. The REST endpoint provides a much clearer view. You can get a summarized/realtime-ish view of the endpoint via the script @ http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

0 Karma

Explorer

I found another article which shed light on the page: http://SPLUNKSERVER:8089/services/admin/inputstatus/TailingProcessor:FileStatus which shows a good indication of what is being processed. I suppose that "number of files" field on the inputs summary page does not take into account any filtered items.

Using the pattern: .-web-\d+./*.log$ seemed to work just fine. It was just driving me nuts thinking I was indexing all these other files when I wasn't.

0 Karma

Splunk Employee
Splunk Employee

what is the path/pattern you entered to be monitored? Did you enter each individual directory, or specify /opt/var/log/httpd/web-*/*.log, or something else?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!