I've got a local directory configured in my inputs.conf as so:
[monitor:///Volumes/A/b/c/dir] disabled = false followTail = 0 host = fubar
And this seemed to be working fine, but for No Reason I Can See, as new files were added to "dir" they ceased being indexed. The inputs manager says there should be 23 files in the directory (there are 22) and each file is named after the date it was created, such as 20110410.txt and so on.
However, nothing new shows up for the last few days in the Splunk index, and no searches yield results for the data that is clearly in files which should be monitored in that directory.
What's the best/right way to fix this? I found other posts here which had similar issues and people suggested | delete and re-adding specific files as a one-off, but I'm not sure that's a long term fix. I've also tried deleting and re-adding the data source, but, to no avail.
When I do a CLI check for the source:
PROPERTIES OF /Volumes/A/b/c/dir/20110422.txt Attr:ANNOTATE_PUNCT True Attr:BREAK_ONLY_BEFORE Attr:BREAK_ONLY_BEFORE_DATE True Attr:CHARSET UTF-8 Attr:DATETIME_CONFIG /etc/datetime.xml Attr:HEADER_MODE Attr:LEARN_SOURCETYPE true Attr:LINE_BREAKER_LOOKBEHIND 100 Attr:MAX_DAYS_AGO 2000 Attr:MAX_DAYS_HENCE 2 Attr:MAX_DIFF_SECS_AGO 3600 Attr:MAX_DIFF_SECS_HENCE 604800 Attr:MAX_EVENTS 256 Attr:MAX_TIMESTAMP_LOOKAHEAD 128 Attr:MUST_BREAK_AFTER Attr:MUST_NOT_BREAK_AFTER Attr:MUST_NOT_BREAK_BEFORE Attr:PREFIX_SOURCETYPE True Attr:SEGMENTATION indexing Attr:SEGMENTATION-all full Attr:SEGMENTATION-inner inner Attr:SEGMENTATION-outer outer Attr:SEGMENTATION-raw none Attr:SEGMENTATION-standard standard Attr:SHOULD_LINEMERGE False Attr:TRANSFORMS Attr:TRUNCATE 10000 Attr:is_valid True Attr:maxDist 9999 Attr:sourcetype 20110422-too_small
As far as I can tell, that looks right.
Thanks!!
First off, take a look in splunkd.log and see if it tells you if/why it is ignoring your files.
My guess is that each days' files begin and end substantially the same.
Splunk uses a CRC to try to avoid re-indexing files when they roll.
Unfortunately, similar, non-rolling files can fool it into thinking it's seen the file before and ignore it.
If you are creating new files each day with a new name, you could add:
crcSalt = <SOURCE>
to your inputs.conf stanza.
See How Splunk recognizes log rotation for more info on this topic.
First off, take a look in splunkd.log and see if it tells you if/why it is ignoring your files.
My guess is that each days' files begin and end substantially the same.
Splunk uses a CRC to try to avoid re-indexing files when they roll.
Unfortunately, similar, non-rolling files can fool it into thinking it's seen the file before and ignore it.
If you are creating new files each day with a new name, you could add:
crcSalt = <SOURCE>
to your inputs.conf stanza.
See How Splunk recognizes log rotation for more info on this topic.
To find out exactly why a given file or directory was ignored by the tailing processor, I recommend to look at the file input status endpoint of splunkd, which you can usually find at https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus - Do note that you need to look at this on the instance doing the file monitoring whether it's the indexer or a forwarder.
For better, real-time readability you can use a python script written by one of our developers - http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
Yep, with that log message, crcSalt=
Indeed, this may be a good clue. The log says:
04-23-2011 15:09:24.453 +1000 ERROR TailingProcessor - File will not be read, seekptr checksum did not match (file=/Volumes/A/b/c/dir/20110422.txt). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or file a support case online at http://www.splunk.com/page/submit_issue for more info.
Each days file differs greatly, but I'll read up on this, thanks!