I have a bunch of .tgz files that are being regularly uploaded to a directory and I'd like to only index a subset of the files inside the archive files.
Example archive files:
tar tzvf archive.1.2.tgz
-rw-r--r-- 0 wdh wdh 948 Jan 10 09:24 app1.log
-rw-r--r-- 0 wdh wdh 414 Jan 10 09:24 foo.log
-rw-r--r-- 0 wdh wdh 770 Jan 10 09:24 splat.log
tar tzvf archive.5.8.tgz
-rw-r--r-- 0 wdh wdh 148 Jan 10 09:24 app3.log
-rw-r--r-- 0 wdh wdh 216 Jan 10 09:24 bad.log
-rw-r--r-- 0 wdh wdh 789 Jan 10 09:24 splat.log
From the example above, I'd like only the "splat.log" file inside archive.*.tgz to be indexed. It appears to me that the whitelist/blacklist settings for an inputs.conf stanza only apply to the archive file name, not to files inside the archive.
While I know I can have some external batch process run and pull the 'splat.log' files out, is there any way I can use whitelist/blacklist, or some other Splunk configuration mechanism to filter based on the internal filenames inside the archive files?
Hi,
Did you ever find a way to do this? 🙂
Is this an issue with 4.3 as well? Been beating my heat on this one as well.
Not quite what you're looking for, but if nothing else you could route the events to nullQueue
to discard the events from the unwanted files at index time.
I've just run into this issue myself and have been beating my head against the wall trying to figure it out. It's odd that splunk supports using the name of a file inside a tgz with regex to specify the hostname, but it can't look inside the tarball for the blacklist. Very frustrating!