Getting Data In

Why are files in a monitored directory being skipped?

demondo
Engager

Hi all,

I am using the directory monitoring feature to index files below a specific path. The stanza in inputs.conf looks like this:

[monitor://E:\Logs\UTC]
disabled = false
host_regex = \.?(?<host>[A-Za-z_]*)_[0-9]
sourcetype = tsv

Looking at the Splunk data though, I occasionally see that files placed in to that directory do not get indexed. I can manually index each of these files using the oneshot CLI command, but was hoping to figure out why they were skipped in the first place. Has anyone seen this before?

Any assistance would be appreciated.

Tags (2)
1 Solution

tom_frotscher
Builder

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

View solution in original post

tom_frotscher
Builder

Hi,

this is a common question, often seen here.

Most of the time, this is caused by files with large headers. I see your input is a tsv, so this can also apply to your problem.

Splunk uses a hash value to determine if a file has already been indexed. To calculate the hashvalue, the first few lines or signs (how much to read is configurable) of a file are read and the hash is calculated. If your file has a large header, there is the posibility that the hash is equal for several files.

You have two options:

1) Expand the amount of lines or signs splunk reads to calculate the hash.
Details here, you have to search for "initCrcLength".

2) Add a salt to the read lines / signs.

Similar problem in answers.
Or in the Docs, you have to search for "initCrcLength".

Greetings

Tom

demondo
Engager

Thanks Tom,

Your theory about the header is probably right. I ended up fixing the issue by setting

crcSalt =

In inputs.conf. It did result in my files being double indexed after resetting the Splunk server. That was a bit of a pain to resolve, but once I did the issue appears to have been fixed. Thanks for the tip!

Best,
Rob Rolnick

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...