Getting Data In

Monitor directory containing zip files

gelica
Communicator

Hi,

I'm trying to monitor a directory which contains zip files. The zip files contain different file types, and I'm only interested in indexing the txt files.
My path would be something like: dir\something.zip\file.txt

I have tried some different monitor approaches, but either nothing gets indexed or all of the files in my zip file are indexed. Here are a few examples of what I have tried in inputs.conf:

[monitor://C:\Users\angeliga\Filer\...]
disabled = false
followTail = 0
sourcetype = my_type
whitelist=*.txt

[monitor://C:\Users\angeliga\Filer\...\*.txt]
disabled = false
followTail = 0
sourcetype = my_type

Does anybody have any idea of what I'm doing wrong?
Thanks!

dantimola
Communicator

alt text

splunkd.log

my inputs.conf
[monitor:///home/administrator/Pictures]
disabled = false
host = pgwlogs
index = pgw_logsource
sourcetype = pgw

./splunk list monitor
$SPLUNK_HOME/var/spool/splunk/...stash_new
/home/administrator/Pictures/
/home/administrator/Pictures/OK_USCDB_1_20161108050001.tar.gz
Monitored Files:
$SPLUNK_HOME/etc/splunk.version

0 Karma

grijhwani
Motivator

First let me stress I have not done this, and I am not even completely confident of the file syntax, but I suspect the path you want is something along the lines of:

$SPLUNK_HOME/etc/system/local/inputs.conf

[monitor://C:\Users\angeliga\Filer\...]
disabled = false
whitelist=*.txt
followTail = 0
sourcetype = my_type

$SPLUNK_HOME/etc/system/local/props.conf

[source:://C:\Users\angeliga\Filer\...]
TRANSFORMS-set=droprecord,userecord

$SPLUNK_HOME/etc/system/local/transforms.conf

[droprecord]
REGEX=.
DEST_KEY=queue
FORMAT=nullQueue

[userecord]
REGEX={targetmatch}
DEST_KEY=queue
FORMAT=indexQueue

This assumes that rather than targetting the .txt files within the .zip file, you have a record structure you can target for the "userecord" regex. Certainly, if I were to investigate this is where I would begin, but I could be entirely and utterly wrong. It is at best an educated guess.

I will be watching with interest to see if there is, in fact, a direct solution to what you want to do.

0 Karma

grijhwani
Motivator

I think you're missing the point. The regex is a pattern match to to target the format of the records within the text files, not the text file names. I am assuming that the text files follow some regular format.

The regex matching means that yes the files get processed, but only the matching records will actually be indexed.

0 Karma

arunsundarm
Engager

Yes thats right, some cases we write regex for hostnames that again scans the records and assign the host name to the events

0 Karma

gelica
Communicator

I appreciate your help, but unfortnuately, I didn't get it to work..

I tried some different options, including setting my "keep-regex" to a specific file name that is in the compressed file. I also tried excluding the whitelist parameter, or sending both droprecord and userecord to nullQueue.

I still get non-txt files indexed, it seems like Splunk doesn't like this approach, and maybe I have to extract the zip files beforehand.

0 Karma

gelica
Communicator

Thanks for your suggestion, I will try it and hope it works. 🙂

But I wonder if this means that the all of the files gets indexed at first and the the unwanted files gets sorted out? Or will this in fact only index the files that I want?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...