Monitoring Splunk

What regular interval of time new files will be indexed when we are monitoring a folder

kanishq
New Member

I'm having a folder with five files trying to get monitored. We have given the folder path , source type to Automatic and we have seen the data is indexed and is ready for Search . But while searching we observed that only one file got indexed and rest of the four files data is not present . Can someone please let us know what we missed so that rest four files data is not present ?

Tags (1)
0 Karma

woodcock
Esteemed Legend

By default, Splunk uses the first 256 bytes (the head) to determine if this is a file that it has seen before and the last 256 bytes (the tail) to see if it has changed since the list time that it has seen it. This is stored in the fishbucket (where a fisherman puts the heads and tails which he removes from the fish he catches). Your files have the same first 256 bytes so you need to increase the value of initCrcLength to 256 bytes bigger than the length of the part of the beginning of the file that is constant. If your files are identical, you can use the crcSalt=<SOURCE> feature to include the name of the file as a discriminating factor and any time that you rename the file (as long as it still matches your monitor stanza), it will be reindexed. By default this is NOT the case.

See here:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf

In partcular this:

crcSalt = <string>
* Use this setting to force the input to consume files that have matching CRCs
  (cyclic redundancy checks).
    * By default, the input only performs CRC checks against the first 256
      bytes of a file. This behavior prevents the input from indexing the same
      file twice, even though you might have renamed it, as with rolling log
      files, for example. Because the CRC is based on only the first
      few lines of the file, it is possible for legitimately different files
      to have matching CRCs, particularly if they have identical headers.
* If set, <string> is added to the CRC.
* If set to the literal string "<SOURCE>" (including the angle brackets), the
  full directory path to the source file is added to the CRC. This ensures
  that each file being monitored has a unique CRC. When crcSalt is invoked,
  it is usually set to <SOURCE>.
* Be cautious about using this setting with rolling log files; it could lead
  to the log file being re-indexed after it has rolled.
* In many situations, initCrcLength can be used to achieve the same goals.
* Default: empty string.

initCrcLength = <integer>
* How much of a file, in bytes, that the input reads before trying to
  identify whether it is a file that has already been seen. You might want to
  adjust this if you have many files with common headers (comment headers,
  long CSV headers, etc) and recurring filenames.
* Cannot be less than 256 or more than 1048576.
* CAUTION: Improper use of this setting will cause data to be re-indexed. You
  might want to consult with Splunk Support before adjusting this value - the
  default is fine for most installations.
* Default: 256 (bytes).
0 Karma

jluo_splunk
Splunk Employee
Splunk Employee

Could you include what the file paths look like, and your inputs.conf stanza for this directory?

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...