Splunk Search

inputs.conf overlapping regex whitelist help

brianirwin
Path Finder

Hello all,

I am really sorry to be posing this question, as I see that many variants of it have already been answered, but I just can't seem to crack my version of it and its a Friday and my brain is broken. It also looks like this may be a bit easier in 4.1 but for now my LW forwarders are all 4.0.11.

Problem:
I have a lot of collisions in my name space of files I want to index, I have externalized sourcetypes etc. to props.conf so I do not need to worry about that in my inputs.

Files I want to monitor

[monitor:///opt/logs/*.prd/*-access.log]
[monitor:///opt/logs/*.prd/*/*EndAudit.csv]
[monitor:///opt/logs/*.prd/*/*NPS-*.log]

Files I do not want to monitor

i) All the other data under /opt/logs including rotated logs e.g. /opt/logs/.prd/-access.log.1

ii) Everything in the lower subdirectory where NPS is stored that is not NPS-.log e.g. /opt/logs/.prd//*foo-.log

iii) Any file with 10.05 in it, as that version had some issues that made it useless in Splunk

Two versions of my broken config

[monitor:///opt/logs/*.prd/]
whitelist = (NPS*\.log|*EndAudit\.csv)
disabled = false
crcSalt = <SOURCE>
time_before_close = 8
blacklist = \10.05$

[monitor:///opt/logs/*.prd/*/*]
whitelist = (^\/NPS-*.log$|^\/*EndAudit.csv$)
disabled = false
crcSalt = <SOURCE>
time_before_close = 8
blacklist = \10.05$

What is the problem?

You Regex gurus may have looked at this and immediately seen my mistake, but my problem is that in addition to the things I intended have indexed, everything under /opt/logs/.prd/ or /opt/logs/.prd/*/ is being indexed (I did not verify if it was everything but hundreds of other files ending in .log are being indexed)

I have read http://www.splunk.com/base/Documentation/latest/Data/Specifyinputpathswithwildcards more times than I wish to count

If I choose not to regex the directories and get explicit about what to index all works fine, but as you have likely guessed we run more than one instance of the server per host, across many dozen hosts so hard-coding all the permutations and combinations of directory paths seems tedious at best.

I know this is largely a duplicate and I apologize for that, on the upside if you answer works I will mark you as correct and lend what weak credibility I have to your name 🙂

Brian

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

The parameter name whitelist doesn't exist in version 4.0.x and down. The correct name is _whitelist. That is deprecated and overridden from 4.1.x and up by a new parameter whitelist. You should need just:

[monitor:///opt/logs/*.prd/]
_whitelist=^/opt/logs/[^/]*\.prd/(?:[^/]*-access\.log|[^/]*/[^/]/(?:[^/]*EndAudit\.csv|[^/]*NPS-[^/]*\.log))

Or more simply but possibly less accurately:

[monitor:///opt/logs/*.prd/]
_whitelist=[^/]*(?:-access\.log|EndAudit\.csv|NPS-[^/]*\.log)$

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

The parameter name whitelist doesn't exist in version 4.0.x and down. The correct name is _whitelist. That is deprecated and overridden from 4.1.x and up by a new parameter whitelist. You should need just:

[monitor:///opt/logs/*.prd/]
_whitelist=^/opt/logs/[^/]*\.prd/(?:[^/]*-access\.log|[^/]*/[^/]/(?:[^/]*EndAudit\.csv|[^/]*NPS-[^/]*\.log))

Or more simply but possibly less accurately:

[monitor:///opt/logs/*.prd/]
_whitelist=[^/]*(?:-access\.log|EndAudit\.csv|NPS-[^/]*\.log)$

brianirwin
Path Finder

The first one didn't quite work, and when I tried to debug it it was a bit too involved. The less accurate one worked nicely and I tweaked it a little with some blacklisting.

Very good catch re: _whitelist v whitelist, I should have used the current version of the manual, and missed that first time I read your suggestion.

_blacklist = (.*adm-access|.*\.gz|.*10\.05|\.\d+\.csv)
_whitelist = (.*\.csv|.*NPS-.*\.log)
[monitor:///opt/logs]

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...