
I've got a local directory configured in my inputs.conf as so:
[monitor:///Volumes/A/b/c/dir] disabled = false followTail = 0 host = fubar
And this seemed to be working fine, but for No Reason I Can See, as new files were added to "dir" they ceased being indexed. The inputs manager says there should be 23 files in the directory (there are 22) and each file is named after the date it was created, such as 20110410.txt and so on.
However, nothing new shows up for the last few days in the Splunk index, and no searches yield results for the data that is clearly in files which should be monitored in that directory.
What's the best/right way to fix this? I found other posts here which had similar issues and people suggested | delete and re-adding specific files as a one-off, but I'm not sure that's a long term fix. I've also tried deleting and re-adding the data source, but, to no avail.
When I do a CLI check for the source:
PROPERTIES OF /Volumes/A/b/c/dir/20110422.txt
Attr:ANNOTATE_PUNCT True
Attr:BREAK_ONLY_BEFORE
Attr:BREAK_ONLY_BEFORE_DATE True
Attr:CHARSET UTF-8
Attr:DATETIME_CONFIG /etc/datetime.xml
Attr:HEADER_MODE
Attr:LEARN_SOURCETYPE true
Attr:LINE_BREAKER_LOOKBEHIND 100
Attr:MAX_DAYS_AGO 2000
Attr:MAX_DAYS_HENCE 2
Attr:MAX_DIFF_SECS_AGO 3600
Attr:MAX_DIFF_SECS_HENCE 604800
Attr:MAX_EVENTS 256
Attr:MAX_TIMESTAMP_LOOKAHEAD 128
Attr:MUST_BREAK_AFTER
Attr:MUST_NOT_BREAK_AFTER
Attr:MUST_NOT_BREAK_BEFORE
Attr:PREFIX_SOURCETYPE True
Attr:SEGMENTATION indexing
Attr:SEGMENTATION-all full
Attr:SEGMENTATION-inner inner
Attr:SEGMENTATION-outer outer
Attr:SEGMENTATION-raw none
Attr:SEGMENTATION-standard standard
Attr:SHOULD_LINEMERGE False
Attr:TRANSFORMS
Attr:TRUNCATE 10000
Attr:is_valid True
Attr:maxDist 9999
Attr:sourcetype 20110422-too_small
As far as I can tell, that looks right.
Thanks!!
First off, take a look in splunkd.log and see if it tells you if/why it is ignoring your files.
My guess is that each days' files begin and end substantially the same.
Splunk uses a CRC to try to avoid re-indexing files when they roll.
Unfortunately, similar, non-rolling files can fool it into thinking it's seen the file before and ignore it.
If you are creating new files each day with a new name, you could add:
crcSalt = <SOURCE>
to your inputs.conf stanza.
See How Splunk recognizes log rotation for more info on this topic.
First off, take a look in splunkd.log and see if it tells you if/why it is ignoring your files.
My guess is that each days' files begin and end substantially the same.
Splunk uses a CRC to try to avoid re-indexing files when they roll.
Unfortunately, similar, non-rolling files can fool it into thinking it's seen the file before and ignore it.
If you are creating new files each day with a new name, you could add:
crcSalt = <SOURCE>
to your inputs.conf stanza.
See How Splunk recognizes log rotation for more info on this topic.
To find out exactly why a given file or directory was ignored by the tailing processor, I recommend to look at the file input status endpoint of splunkd, which you can usually find at https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus - Do note that you need to look at this on the instance doing the file monitoring whether it's the indexer or a forwarder.
For better, real-time readability you can use a python script written by one of our developers - http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
Yep, with that log message, crcSalt=
Indeed, this may be a good clue. The log says:
04-23-2011 15:09:24.453 +1000 ERROR TailingProcessor - File will not be read, seekptr checksum did not match (file=/Volumes/A/b/c/dir/20110422.txt). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or file a support case online at http://www.splunk.com/page/submit_issue for more info.
Each days file differs greatly, but I'll read up on this, thanks!