Getting Data In

Tailing processor : Y U ignore some files?

howyagoin
Contributor

alt text

I've got a local directory configured in my inputs.conf as so:

[monitor:///Volumes/A/b/c/dir]
disabled = false
followTail = 0
host = fubar

And this seemed to be working fine, but for No Reason I Can See, as new files were added to "dir" they ceased being indexed. The inputs manager says there should be 23 files in the directory (there are 22) and each file is named after the date it was created, such as 20110410.txt and so on.

However, nothing new shows up for the last few days in the Splunk index, and no searches yield results for the data that is clearly in files which should be monitored in that directory.

What's the best/right way to fix this? I found other posts here which had similar issues and people suggested | delete and re-adding specific files as a one-off, but I'm not sure that's a long term fix. I've also tried deleting and re-adding the data source, but, to no avail.

When I do a CLI check for the source:


PROPERTIES OF /Volumes/A/b/c/dir/20110422.txt
    Attr:ANNOTATE_PUNCT True
    Attr:BREAK_ONLY_BEFORE  
    Attr:BREAK_ONLY_BEFORE_DATE True
    Attr:CHARSET    UTF-8
    Attr:DATETIME_CONFIG    /etc/datetime.xml
    Attr:HEADER_MODE    
    Attr:LEARN_SOURCETYPE   true
    Attr:LINE_BREAKER_LOOKBEHIND    100
    Attr:MAX_DAYS_AGO   2000
    Attr:MAX_DAYS_HENCE 2
    Attr:MAX_DIFF_SECS_AGO  3600
    Attr:MAX_DIFF_SECS_HENCE    604800
    Attr:MAX_EVENTS 256
    Attr:MAX_TIMESTAMP_LOOKAHEAD    128
    Attr:MUST_BREAK_AFTER   
    Attr:MUST_NOT_BREAK_AFTER   
    Attr:MUST_NOT_BREAK_BEFORE  
    Attr:PREFIX_SOURCETYPE  True
    Attr:SEGMENTATION   indexing
    Attr:SEGMENTATION-all   full
    Attr:SEGMENTATION-inner inner
    Attr:SEGMENTATION-outer outer
    Attr:SEGMENTATION-raw   none
    Attr:SEGMENTATION-standard  standard
    Attr:SHOULD_LINEMERGE   False
    Attr:TRANSFORMS 
    Attr:TRUNCATE   10000
    Attr:is_valid   True
    Attr:maxDist    9999
    Attr:sourcetype 20110422-too_small

As far as I can tell, that looks right.

Thanks!!

1 Solution

mslvrstn
Communicator

First off, take a look in splunkd.log and see if it tells you if/why it is ignoring your files.
My guess is that each days' files begin and end substantially the same.
Splunk uses a CRC to try to avoid re-indexing files when they roll.
Unfortunately, similar, non-rolling files can fool it into thinking it's seen the file before and ignore it.

If you are creating new files each day with a new name, you could add:

crcSalt = <SOURCE>

to your inputs.conf stanza.

See How Splunk recognizes log rotation for more info on this topic.

View solution in original post

mslvrstn
Communicator

First off, take a look in splunkd.log and see if it tells you if/why it is ignoring your files.
My guess is that each days' files begin and end substantially the same.
Splunk uses a CRC to try to avoid re-indexing files when they roll.
Unfortunately, similar, non-rolling files can fool it into thinking it's seen the file before and ignore it.

If you are creating new files each day with a new name, you could add:

crcSalt = <SOURCE>

to your inputs.conf stanza.

See How Splunk recognizes log rotation for more info on this topic.

View solution in original post

hexx
Splunk Employee
Splunk Employee

To find out exactly why a given file or directory was ignored by the tailing processor, I recommend to look at the file input status endpoint of splunkd, which you can usually find at https://localhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus - Do note that you need to look at this on the instance doing the file monitoring whether it's the indexer or a forwarder.

For better, real-time readability you can use a python script written by one of our developers - http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

0 Karma

mslvrstn
Communicator

Yep, with that log message, crcSalt= should get you going

0 Karma

howyagoin
Contributor

Indeed, this may be a good clue. The log says:

04-23-2011 15:09:24.453 +1000 ERROR TailingProcessor - File will not be read, seekptr checksum did not match (file=/Volumes/A/b/c/dir/20110422.txt). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or file a support case online at http://www.splunk.com/page/submit_issue for more info.

Each days file differs greatly, but I'll read up on this, thanks!

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.