Getting Data In

Log duplicates single event in log file hundreds of times

koyachi
Explorer

Hi Folks,

We have a complaint from stakeholders that they are seeing duplicate events in Splunk. they shared few examples where same events were indexed hundreds of times, sometimes in thousands. I can confirm that there are no duplicate stanzas to monitor log files in inputs.conf.

I checked the actual log files and could see some events were duplicated in source file itself. count of it was 12 however in splunk same event was indexed 524 times. We see there are lot of inconsistencies in how the logs are being written at source. we see timestamps are either missing or partial. could that be the reason splunk is kind of going in bad state and re reading the log files? 

This is how my inputs.conf is configured. 

[monitor://\\hostname\log]
recursive = false
disabled = false
followTail = 0
host =
host_regex = ^.*\d-(.+?)-\d+-\d{8}_?\d*\.
ignoreOlderThan = 8d
index = index
sourcetype = sourcetype
whitelist = \.log|\.out
time_before_close = 1
initCrcLength = 4096
0 Karma

PickleRick
SplunkTrust
SplunkTrust
followTail = 0

Why this setting? And are you aware of what it does?

followTail = <boolean>
* Whether or not the input should skip past current data in a monitored file
  for a given input stanza.
* This setting lets you skip over data in files, and immediately begin indexing
  current data.
* If you set to "1", monitoring starts at the end of the file (like
  *nix 'tail -f'). The input does not read any data that exists in
  the file when it is first encountered. The input only reads data that
  arrives after the first encounter time.
* If you set to "0", monitoring starts at the beginning of the file.
* This is an advanced setting. Contact Splunk Support before using it.
* Best practice for using this setting:
  * Enable this setting and start the Splunk instance.
  * Wait enough time for the input to identify the related files.
  * Disable the setting and restart the instance.
* Do not leave 'followTail' enabled in an ongoing fashion.
* Do not use 'followTail' for rolling log files (log files that get renamed as
  they age) or files whose names or paths vary.
* Default: 0
0 Karma

koyachi
Explorer

Well this config is present since past few years and I was not part of team then. These logs are too critical for application team so we generally avoid making any config changes as it could worsen the conditions.

There are several issues within the source log files that we could find.

1) Sometimes the log lines are duplicated in sourcefile itself ( if its duplicated 10times in sourcefile, on Splunk it should show 10 times however Splunk shows in thousands

2) There are lot of inconsistencies in log timestamps in sourcefile itself. sometimes the date is partial or is missing.

Can these inconsistencies cause splunk to go in bad state? 

Do you think taking off followtail tag will actually fix the duplication issue? 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
How those log files are rotated?
- renamed + create a new (created a new I-node)
- copy to new + copy from /dev/null (keep the same I-node)
- closed + create a new with some prefix (created a new I-node)
0 Karma

koyachi
Explorer

@isoutamo Well all i know is that application streams logs to a windows based filer and we use heavy forwarder to monitor those files. 

Files rotates when it reaches 10 mb and new file is created. 

here is example of how files are renamed 

abc.0.log - once it reaches 10mb a new file is created

abc.1.log … abc.2.log and so on.. 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Normally that schema is used when logs are rotated and old ones are "renamed" like

abc.10.log deleted

abc.9.log -> abc.10.log

abc.8.log -> abc.9.log

....

abc.0.log -> abc.1.log

abc.0.log created from scratch.

The essential questions is are those logs renamed or copy+truncated? In first case those are probably handles as old one in second case as a new ones which could mean that those are reread. That in general level. I'm not sure how this is working with splunk on windows network share especially with that tailreader=1 parameter.

Unfortunately I haven't any environment where test this to check how it really works.

I propose to you to create a support case to splunk to get real answer. Or try to ask help from Splunk UG Slack.

0 Karma
Get Updates on the Splunk Community!

Exciting News: The AppDynamics Community Joins Splunk!

Hello Splunkers,   I’d like to introduce myself—I’m Ryan, the former AppDynamics Community Manager, and I’m ...

The All New Performance Insights for Splunk

Splunk gives you amazing tools to analyze system data and make business-critical decisions, react to issues, ...

Good Sourcetype Naming

When it comes to getting data in, one of the earliest decisions made is what to use as a sourcetype. Often, ...