Getting Data In

Why am I seeing duplicate events in Splunk with my current configuration?

bharathkumarnec
Contributor

Hi All,

I am facing a problem with duplicate events in splunk, below is my configuration:

props.conf:

[source::.../abc.out]
TRANSFORMS-sourcetype1 = force_abc_sourcetype

Transforms.conf:

[force_abc_sourcetype]
DEST_KEY = MetaData:Sourcetype
REGEX = .*
FORMAT = sourcetype::abc

inputs.conf:

[monitor:///123/*.out] -- "here *.out includes abc.out"
index = abc
disabled = 0

Do I need to change monitor file from *.out to abc.out to fix this issue? or anything else that I need to look for?

Kindly provide your inputs...

Regards,

0 Karma

fdi01
Motivator

try with CHECK_METHOD = endpoint_md5 in your stanza of props.conf

0 Karma

bharathkumarnec
Contributor

Hi fdi01,

Thanks for your reply.

Looks like default option is endpoint_md5, we need to explicitly mention this in props??

and my abc.out file is rotated based on size, if it reaches some size it will rotate and new file will get created.

Here is my error message:

WARN TcpOutputProc - Possible duplication of events with channel=source::/123/.out|host::xyz|out-2|571, streamId=0, offset=0 on host=indexer:9997

what is out-2 in the above warning, if it is sourcetype, sourcetype given as per above configuration file is abc, then why is it showing out-2??

I am using indexer acknowledgement using useAck=true.

0 Karma

fdi01
Motivator

by default, CHECK_METHOD= modtime.
The workaround is to set priority higher than the default stanza using source stanza in props.conf (If you use host or sourcetype in the props.conf will not work), specify CHECK_METHOD so it will not use modtime.

For example:
[source::.../abc.out]
sourcetype=log_in_abc.out
CHECK_METHOD = endpoint_md5
priority = 10

0 Karma

bharathkumarnec
Contributor

This I could find:

CHECK_METHOD = [endpoint_md5|entire_md5|modtime]
* Set CHECK_METHOD endpoint_md5 to have Splunk checksum of the first and last 256 bytes of a
file. When it finds matches, Splunk lists the file as already indexed and indexes only new
data, or ignores it if there is no new data.
* Set CHECK_METHOD = entire_md5 to use the checksum of the entire file.
* Set CHECK_METHOD = modtime to check only the modification time of the file.
* Settings other than endpoint_md5 cause Splunk to index the entire file for each detected
change.
* Defaults to endpoint_md5.
* Important: this option is only valid for [source::] stanzas.

And also I could see below warning message on my forwarders:

WARN TcpOutputProc - Read operation timed out expecting ACK from indexer in 300 seconds.

0 Karma

fdi01
Motivator

in your /opt/splunk/etc/system/default/props.conf the Defaults CHECK_METHOD = modtime by in manuel it is CHECK_METHOD = endpoint_md5.
note: erase all data already indexed and resumes a new index on the same indexed or a new index. with the new configuration props.conf

0 Karma

ehudb
Contributor

Didn't understand what are the inputs you get duplicate from

Do you get duplicate events only from this input? or you have more input configurations that might input this "abc.out" file?

0 Karma
Get Updates on the Splunk Community!

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...

What's New in Splunk Observability Cloud and Splunk AppDynamics - May 2025

This month, we’re delivering several new innovations in Splunk Observability Cloud and Splunk AppDynamics ...