Data Input: Monitor a directory for new files and ...

vivsplunk · ‎01-18-2011

I'm trying to use "Monitor Files & Directories" as data input. I got two Data Input sources,

One is script that runs every 10 min and puts a data file on Splunk file system (/opt/splunk/var/ps_search/)
Second data input is the "Monitor Files & Directories" that basically is supposed to look under the /opt/splunk/var/ps_search directory and index all the incoming files.

The incoming files are of "csv" type and have unique file name (timestamp in the file name). I see only the first csv file getting indexed and not the subsequent ones that are generated by the script. I've read http://answers.splunk.com/questions/4103/directory-monitoring-not-picking-up-new-files and http://www.splunk.com/base/Documentation/latest/Admin/Monitorfilesanddirectories, but not sure what else I need to do. Few Questions,

In the documentation it says the monitor would only check for new files every 24 hours - is that right? How else can I make it to continously look for new files in the directly? Do I need to use crawl?
Is it possible to use monitor to do the above and when the file is indexed delete that file (similar to using sinkhole)?

In my case once a file is copied into the directory it's not changed, so I basically just want to delete it once Splunk has indexed it.

Genti · ‎01-18-2011

No, if the docs say that then they need to be corrected. Splunk monitor actually checks directories every second (unless it's backed up, which might mean a little less often than that)
Yes, you can do one of the following:
- Use `[batch://]` instead of `[monitor://]`
- save your script output to /opt/splunk/var/spool/splunk which acts as a sinkhole. It will index everything you want then delete the files

However, i think you should look into why your monitor is not reading the additional csv files that are being created. Check your `splunkd.log` for any logs related to this. Perhaps the files are too similar and you are getting a crccheck issue (where the crc of the files is too similar and splunk doesnt index because it thinks its the same file. Basically the first 250 chars of the files are the same, in this case look for `crcSalt` in `inputs.conf`)

Please read input.conf.spec for more information on [batch://] and crcSalt.

Genti · ‎01-19-2011

spool directory is like big brother, always watching for files being dropped there. Once it reads the file, it deletes it. Think of it as a sinkhole.. You can always try it out, dump some files there and youll experience it first hand. The only issue with this method, i think, is that you cant really specify source, sourcetype, which index the data should go to etc.. I believe 4.2 will have some improvements in this area.

vivsplunk · ‎01-18-2011

Thanks. I added "crcSalt = " in the inputs.conf and it started indexing other csv files. I guess the problem (as you suggested) was that all my csv file start with a header row and the first row (with almost 10 column names) is same. I'm not sure if Splunk should consider csv differently as they can have the same header row. Anyway that part works for now.

My other issue is how do I delete the indexed files and still keep continous inputs. The "batch" option seems to work only one-time. Would writing to spool directory continuously read new files - just like monitor?

Data Input: Monitor a directory for new files and delete when indexed

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

Data Input: Monitor a directory for new files and delete when indexed

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases