Hi. We are seeing duplicate logfile entries in our Search results with certain logfiles. It is happening in a directory that is pointed to with a symbolic link (i.e., "/hosting/logs/myapp/" is a symbolic link for "/hosting/logs/app123/").
Our monitor stanza looks like this...
[monitor:///hosting/logs]
crcSalt = SOURCE
blacklist = blah blah blah
whitelist = blah blah blah
recursive = true
NOTE: In the actual inputs.conf, there are angle brackets surrounding SOURCE, but adding them in this text input box seems to invoke some markdown directive, so I removed them.
For certain logfiles, we see the same logfile entry twice... One entry for /hosting/logs/app123/abc.log, and one for /hosting/logs/myapp/abc.log
My questions are...
Is the crcSalt statement forcing /hosting/logs/app123/abc.log to look like a different file than /hosting/logs/myapp/abc.log, causing it to be indexed twice?
If so, what is/are suggested workaround(s)?
And finally... it doesn't happen for ALL logfiles in /hosting/logs/app123/, only some of them. Any ideas why this is?
Thx,
mfeeny1
if you use crcSalt, the source path used on the calculation of the crc is the symbolic path.
So a different symlinks to the same file, and direct link to a file will be considered as different files and be all indexed.
since 5.0 you also can set the parameter initCrcLength (default is 256)
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf