Hello all,
I am really sorry to be posing this question, as I see that many variants of it have already been answered, but I just can't seem to crack my version of it and its a Friday and my brain is broken. It also looks like this may be a bit easier in 4.1 but for now my LW forwarders are all 4.0.11.
Problem:
I have a lot of collisions in my name space of files I want to index, I have externalized sourcetypes etc. to props.conf so I do not need to worry about that in my inputs.
Files I want to monitor
[monitor:///opt/logs/*.prd/*-access.log]
[monitor:///opt/logs/*.prd/*/*EndAudit.csv]
[monitor:///opt/logs/*.prd/*/*NPS-*.log]
Files I do not want to monitor
i) All the other data under /opt/logs including rotated logs e.g. /opt/logs/.prd/-access.log.1
ii) Everything in the lower subdirectory where NPS is stored that is not NPS-.log e.g. /opt/logs/.prd//*foo-.log
iii) Any file with 10.05 in it, as that version had some issues that made it useless in Splunk
Two versions of my broken config
[monitor:///opt/logs/*.prd/]
whitelist = (NPS*\.log|*EndAudit\.csv)
disabled = false
crcSalt = <SOURCE>
time_before_close = 8
blacklist = \10.05$
[monitor:///opt/logs/*.prd/*/*]
whitelist = (^\/NPS-*.log$|^\/*EndAudit.csv$)
disabled = false
crcSalt = <SOURCE>
time_before_close = 8
blacklist = \10.05$
What is the problem?
You Regex gurus may have looked at this and immediately seen my mistake, but my problem is that in addition to the things I intended have indexed, everything under /opt/logs/.prd/ or /opt/logs/.prd/*/ is being indexed (I did not verify if it was everything but hundreds of other files ending in .log are being indexed)
I have read http://www.splunk.com/base/Documentation/latest/Data/Specifyinputpathswithwildcards more times than I wish to count
If I choose not to regex the directories and get explicit about what to index all works fine, but as you have likely guessed we run more than one instance of the server per host, across many dozen hosts so hard-coding all the permutations and combinations of directory paths seems tedious at best.
I know this is largely a duplicate and I apologize for that, on the upside if you answer works I will mark you as correct and lend what weak credibility I have to your name 🙂
Brian
The parameter name whitelist
doesn't exist in version 4.0.x and down. The correct name is _whitelist
. That is deprecated and overridden from 4.1.x and up by a new parameter whitelist
. You should need just:
[monitor:///opt/logs/*.prd/]
_whitelist=^/opt/logs/[^/]*\.prd/(?:[^/]*-access\.log|[^/]*/[^/]/(?:[^/]*EndAudit\.csv|[^/]*NPS-[^/]*\.log))
Or more simply but possibly less accurately:
[monitor:///opt/logs/*.prd/]
_whitelist=[^/]*(?:-access\.log|EndAudit\.csv|NPS-[^/]*\.log)$
The parameter name whitelist
doesn't exist in version 4.0.x and down. The correct name is _whitelist
. That is deprecated and overridden from 4.1.x and up by a new parameter whitelist
. You should need just:
[monitor:///opt/logs/*.prd/]
_whitelist=^/opt/logs/[^/]*\.prd/(?:[^/]*-access\.log|[^/]*/[^/]/(?:[^/]*EndAudit\.csv|[^/]*NPS-[^/]*\.log))
Or more simply but possibly less accurately:
[monitor:///opt/logs/*.prd/]
_whitelist=[^/]*(?:-access\.log|EndAudit\.csv|NPS-[^/]*\.log)$
The first one didn't quite work, and when I tried to debug it it was a bit too involved. The less accurate one worked nicely and I tweaked it a little with some blacklisting.
Very good catch re: _whitelist v whitelist, I should have used the current version of the manual, and missed that first time I read your suggestion.
_blacklist = (.*adm-access|.*\.gz|.*10\.05|\.\d+\.csv)
_whitelist = (.*\.csv|.*NPS-.*\.log)
[monitor:///opt/logs]