Getting Data In

Monitor directory containing zip files

gelica
Communicator

Hi,

I'm trying to monitor a directory which contains zip files. The zip files contain different file types, and I'm only interested in indexing the txt files.
My path would be something like: dir\something.zip\file.txt

I have tried some different monitor approaches, but either nothing gets indexed or all of the files in my zip file are indexed. Here are a few examples of what I have tried in inputs.conf:

[monitor://C:\Users\angeliga\Filer\...]
disabled = false
followTail = 0
sourcetype = my_type
whitelist=*.txt

[monitor://C:\Users\angeliga\Filer\...\*.txt]
disabled = false
followTail = 0
sourcetype = my_type

Does anybody have any idea of what I'm doing wrong?
Thanks!

dantimola
Communicator

alt text

splunkd.log

my inputs.conf
[monitor:///home/administrator/Pictures]
disabled = false
host = pgwlogs
index = pgw_logsource
sourcetype = pgw

./splunk list monitor
$SPLUNK_HOME/var/spool/splunk/...stash_new
/home/administrator/Pictures/
/home/administrator/Pictures/OK_USCDB_1_20161108050001.tar.gz
Monitored Files:
$SPLUNK_HOME/etc/splunk.version

0 Karma

grijhwani
Motivator

First let me stress I have not done this, and I am not even completely confident of the file syntax, but I suspect the path you want is something along the lines of:

$SPLUNK_HOME/etc/system/local/inputs.conf

[monitor://C:\Users\angeliga\Filer\...]
disabled = false
whitelist=*.txt
followTail = 0
sourcetype = my_type

$SPLUNK_HOME/etc/system/local/props.conf

[source:://C:\Users\angeliga\Filer\...]
TRANSFORMS-set=droprecord,userecord

$SPLUNK_HOME/etc/system/local/transforms.conf

[droprecord]
REGEX=.
DEST_KEY=queue
FORMAT=nullQueue

[userecord]
REGEX={targetmatch}
DEST_KEY=queue
FORMAT=indexQueue

This assumes that rather than targetting the .txt files within the .zip file, you have a record structure you can target for the "userecord" regex. Certainly, if I were to investigate this is where I would begin, but I could be entirely and utterly wrong. It is at best an educated guess.

I will be watching with interest to see if there is, in fact, a direct solution to what you want to do.

0 Karma

grijhwani
Motivator

I think you're missing the point. The regex is a pattern match to to target the format of the records within the text files, not the text file names. I am assuming that the text files follow some regular format.

The regex matching means that yes the files get processed, but only the matching records will actually be indexed.

0 Karma

arunsundarm
Engager

Yes thats right, some cases we write regex for hostnames that again scans the records and assign the host name to the events

0 Karma

gelica
Communicator

I appreciate your help, but unfortnuately, I didn't get it to work..

I tried some different options, including setting my "keep-regex" to a specific file name that is in the compressed file. I also tried excluding the whitelist parameter, or sending both droprecord and userecord to nullQueue.

I still get non-txt files indexed, it seems like Splunk doesn't like this approach, and maybe I have to extract the zip files beforehand.

0 Karma

gelica
Communicator

Thanks for your suggestion, I will try it and hope it works. 🙂

But I wonder if this means that the all of the files gets indexed at first and the the unwanted files gets sorted out? Or will this in fact only index the files that I want?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...