Description:
I am using a Splunk Heavy Forwarder (HF) to forward logs to an indexer cluster. I need to configure props.conf and transforms.conf on the HF to drop all logs that originate from a specific directory and any of its subdirectories, without modifying the configuration each time a new subdirectory is created.
Scenario:
The logs I want to discard are located under /var/log/apple/. This directory contains dynamically created subdirectories, such as:
/var/log/apple/nginx/
/var/log/apple/db/intro/
/var/log/apple/some/other/depth/
New subdirectories are added frequently, and I cannot manually update the configuration every time.
Attempted Solution: I configured props.conf as follows:
[source::/var/log/apple(/.*)?]
TRANSFORMS-null=discard_apple_logs
And in transforms.conf:
[discard_apple_logs]
REGEX = . DEST_KEY = queue
FORMAT = nullQueue
However, this does not seem to work, as logs from the subdirectories are still being forwarded to the indexers. Question: What is the correct way to configure props.conf and transforms.conf to drop all logs under /var/log/apple/, including those from any newly created subdirectories? How can I ensure that this rule applies recursively without explicitly listing multiple wildcard patterns? Any guidance would be greatly appreciated!
1. I suppose the easiest solution would be to just blacklist the directory within a specific inputs.conf stanza. (As others already pointed out)
2. Do your events come from monitor inputs on this HF or are they forwarded from other hosts? From HFs or UFs?
3. Ingest actions?
yes you right and i described why i can't in others answer about it
Yes, that was my suspicion.
Your general idea seems ok (provided that your transform definition contains separate lines which just squished into one on copy-paste).
Additional question - aren't you by amy chance using indexed extractions?
If you are, data is sent as parsed and is not procesed by transforms further down the pipeline.
The only thing happening on this Heavy Forwarder is collecting logs, assigning an index based on the source using transforms.conf and props.conf, and then forwarding them to the indexer cluster.
OK. So your transform assigning index based on source does work for the same data?
Yes, it's working correctly. For example, I am reindexing /var/log/syslog to index=os_logs, and it applies as expected.
Well... then it should work.
One thing you could change in your spec is dropping the conditionality at the end (you should never have the directory specified as the source, just files from below this directory) but that's not the issue here.
I noticed one thing though - a similar case as we had not long ago in another thread - your transform class has a name "null". That is a fairly common name so it might be getting overriden somewhere else in your configs. See the btool output if it isn't.
I have already checked, and the transform configuration is correct with no conflicts in other Splunk settings.
Currently, to filter out sources properly, I have to explicitly define each depth of subdirectories using patterns like:
[source::/var/log/apple/*]
TRANSFORMS-null=discard_apple_logs
[source::/var/log/apple/*/*]
TRANSFORMS-null=discard_apple_logs
This ensures that logs from different levels of subdirectories are included in the filtering process.
it's quite strange that Splunk can't handle this scenario if that's the case. Use cases like mine should be fairly common, so I would expect a more straightforward way to handle this.
Well, to be quite precise, it's not a raw regex. The docs say:
Match expressions must match the entire name, not just a substring. Match expressions are based on a full implementation of Perl-compatible regular expressions (PCRE) with the translation of "...", "*", and "." Thus, "." matches a period, "*" matches non-directory separators, and "..." matches any number of any characters.
So in case of wildcards it can get tricky.
I'd try
[source::/var/log/apple/...]
@ParsaIsHash you can use inputs.conf with a blacklist to prevent unwanted files from being forwarded at the source level (on the Heavy Forwarder). This approach stops logs from even being read, which is more efficient than filtering them in props.conf and transforms.conf.
blacklist = <regular expression> * If set, files from this input are NOT monitored if their path matches the specified regex. * Takes precedence over the deprecated '_blacklist' setting, which functions the same way. * If a file matches the regexes in both the deny list and allow list settings, the file is NOT monitored. Deny lists take precedence over allow lists. * No default.
i described that why i cannot access the inputs file on UF
its because of that we do not have permission to access the host
The reason I’m not using inputs.conf with a blacklist is that the hosts sending these logs are managed by another company. They control the Universal Forwarders (UF) and their input configurations, meaning we don’t have access to modify them. However, we still need to mask and drop these logs at our end.