Splunk Search

Monitor a directory and using whitelisting

jeffwarn
Explorer

I have a logging share right on the splunk server where a number of webservers write a few logs to. The structure more or less would look like:

(There are a number of "web" servers such as web-02, -03 ..etc)

/opt/var/log/httpd/web-01.domain.com/access.log
/opt/var/log/httpd/web-01.domain.com/access.log.1.gz
/opt/var/log/httpd/web-01.domain.com/access.log.2.gz
/opt/var/log/httpd/web-01.domain.com/tmp.dmp

(There are a number of "app" servers such as app-02 , -03, ..etc)

/opt/var/log/httpd/app-01.domain.com/access.log
/opt/var/log/httpd/app-01.domain.com/access.log.1.gz
/opt/var/log/httpd/app-01.domain.com/access.log.2.gz
/opt/var/log/httpd/app-01.domain.com/tmp.dmp

Note: the logs in each directory that I want to index are access.log , error.log, rewrite.log ..etc . Basically anything that is a current *.log file.

What I want to do is only index the "web" servers and only the uncompressed *.log files. It should ignore the app server directories.

I setup an input with the following:

Host - regex on path Hostname (regex): /opt/var/log/httpd/([^/]+)/

Advanced Options Whitelist: I've tried the following: (all variations of that without escaping /)

\/web-\d+\.domain\.com\/\w+\.log$

web-*\/*.log$

web-\d+\.*\/\w+\.log$

web-*\.log$

When I save the input and after a minute or so, the "Number of files" on the Input summary page shows 180 when it should only be indexing 24 files.

Is that number on the Input summary page accurate? Is that the actual number of files that are being indexed, even after whitelisting?

If so, what am I missing here?

Tags (3)
0 Karma

amrit
Splunk Employee
Splunk Employee

Sorry - as you've noted, the traditional ways of listing monitor inputs are a bit buggy in recent versions. The REST endpoint provides a much clearer view. You can get a summarized/realtime-ish view of the endpoint via the script @ http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

0 Karma

jeffwarn
Explorer

I found another article which shed light on the page: http://SPLUNKSERVER:8089/services/admin/inputstatus/TailingProcessor:FileStatus which shows a good indication of what is being processed. I suppose that "number of files" field on the inputs summary page does not take into account any filtered items.

Using the pattern: .-web-\d+./*.log$ seemed to work just fine. It was just driving me nuts thinking I was indexing all these other files when I wasn't.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

what is the path/pattern you entered to be monitored? Did you enter each individual directory, or specify /opt/var/log/httpd/web-*/*.log, or something else?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

How to find the worst searches in your Splunk environment and how to fix them

Everyone knows Splunk is a powerful platform for running searches and doing data analytics. Your ...

Share Your Feedback: On Admin Config Service (ACS)!

Help Us Build a Better Admin Config Service Experience (ACS)   We Want Your Feedback on Admin Config Service ...