Solved: Re: Why are files in a monitored directory being s...

demondo · ‎04-23-2015

Hi all,

I am using the directory monitoring feature to index files below a specific path. The stanza in inputs.conf looks like this:

[monitor://E:\Logs\UTC]
disabled = false
host_regex = \.?(?<host>[A-Za-z_]*)_[0-9]
sourcetype = tsv

Looking at the Splunk data though, I occasionally see that files placed in to that directory do not get indexed. I can manually index each of these files using the oneshot CLI command, but was hoping to figure out why they were skipped in the first place. Has anyone seen this before?

Any assistance would be appreciated.

tom_frotscher · ‎04-27-2015

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

View solution in original post

tom_frotscher · ‎04-27-2015

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

demondo · ‎05-03-2015

Thanks Tom,

Your theory about the header is probably right. I ended up fixing the issue by setting

crcSalt =

In inputs.conf. It did result in my files being double indexed after resetting the Splunk server. That was a bit of a pain to resolve, but once I did the issue appears to have been fixed. Thanks for the tip!

Best,
Rob Rolnick

Why are files in a monitored directory being skipped?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation