Getting Data In

duplicate log entries with crcSalt= and symbolic links

Path Finder

Hi. We are seeing duplicate logfile entries in our Search results with certain logfiles. It is happening in a directory that is pointed to with a symbolic link (i.e., "/hosting/logs/myapp/" is a symbolic link for "/hosting/logs/app123/").

Our monitor stanza looks like this...

[monitor:///hosting/logs]

crcSalt = SOURCE

blacklist = blah blah blah

whitelist = blah blah blah

recursive = true

NOTE: In the actual inputs.conf, there are angle brackets surrounding SOURCE, but adding them in this text input box seems to invoke some markdown directive, so I removed them.

For certain logfiles, we see the same logfile entry twice... One entry for /hosting/logs/app123/abc.log, and one for /hosting/logs/myapp/abc.log

My questions are...

Is the crcSalt statement forcing /hosting/logs/app123/abc.log to look like a different file than /hosting/logs/myapp/abc.log, causing it to be indexed twice?

If so, what is/are suggested workaround(s)?

And finally... it doesn't happen for ALL logfiles in /hosting/logs/app123/, only some of them. Any ideas why this is?

Thx,

mfeeny1

0 Karma

Splunk Employee
Splunk Employee

if you use crcSalt, the source path used on the calculation of the crc is the symbolic path.
So a different symlinks to the same file, and direct link to a file will be considered as different files and be all indexed.

Splunk Employee
Splunk Employee

since 5.0 you also can set the parameter initCrcLength (default is 256)
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!