Getting Data In

duplicate log entries with crcSalt= and symbolic links

Path Finder

Hi. We are seeing duplicate logfile entries in our Search results with certain logfiles. It is happening in a directory that is pointed to with a symbolic link (i.e., "/hosting/logs/myapp/" is a symbolic link for "/hosting/logs/app123/").

Our monitor stanza looks like this...

[monitor:///hosting/logs]

crcSalt = SOURCE

blacklist = blah blah blah

whitelist = blah blah blah

recursive = true

NOTE: In the actual inputs.conf, there are angle brackets surrounding SOURCE, but adding them in this text input box seems to invoke some markdown directive, so I removed them.

For certain logfiles, we see the same logfile entry twice... One entry for /hosting/logs/app123/abc.log, and one for /hosting/logs/myapp/abc.log

My questions are...

Is the crcSalt statement forcing /hosting/logs/app123/abc.log to look like a different file than /hosting/logs/myapp/abc.log, causing it to be indexed twice?

If so, what is/are suggested workaround(s)?

And finally... it doesn't happen for ALL logfiles in /hosting/logs/app123/, only some of them. Any ideas why this is?

Thx,

mfeeny1

0 Karma

Splunk Employee
Splunk Employee

if you use crcSalt, the source path used on the calculation of the crc is the symbolic path.
So a different symlinks to the same file, and direct link to a file will be considered as different files and be all indexed.

Splunk Employee
Splunk Employee

since 5.0 you also can set the parameter initCrcLength (default is 256)
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf

0 Karma