I have a folder on a UNC path and I would like for Splunk to simply index the filenames within the folder (the files are JPEGS). What would be the best way to do this?
I am currently monitoring the folder as a data input however I'm seeing nothing indexed. Within the log files I am seeing the error:
TailReader - Ignoring file '\\UNCPATH\FOLDER\PHOTO_NAME.jpg' due to: binary
I'm not that familiar with the back-end of Splunk to make changes so I'd be grateful for any guidance.
Thank you for your help!
let me understand: do you want to monitor only the filenames? in other words, only the list of filenames eventually with file attributes (owner, dimension, etc...), is it correct?
If this is your need, you could create a script that lists files (e.g. using the windows "dir" command) and capture outputs in Splunk.
To do this, you have to create a script (called e.g. monitorjpegfiles.bat) your TA ($SPLUNKHOME/etc/apps/myapp/bin) containing the correct dir command
dir \\UNCPATH\FOLDER\*.jpg .
And then create a stanza in inputs.conf of your TA like the following
###### Scripted Input to monitor jpeg files [script://.\bin\monitor_jpeg_files.bat] disabled = 0 ## Run once per hour interval = 3600 sourcetype = Script:jpef_files index = my_index
After you deploy this TA on the target server and restart Splunk on Universal Forwarder, every hour you'll have the list of jpeg files in your directory.
Hi Giuseppe, thanks again for your answer. I have a follow up question if you do not mind?
Splunk is indexing the data once per hour however it is duplicating the data. In 24hrs I now have the duplicate events in there 24 times. Is there a way to stop this from happening do you know?
Thanks for your help!
I fear that it is not possible to do otherwise: I found myself having to do something similar and I didn't find anything.
You have to manage duplicates at search time.
I can suggest a work around.
Schedule the script with cron and save the output of the script to a file and monitor that file in Splunk.
So whenever the script is executed, it will create the file.
It will not totally stop the duplicates but,
- in case there is no updates in the file names, than the same files names will not be indexed multiple times.
- file names will be listed in alphabetic order, so if any file is renamed / removed / added than only it will reindex the entire file.