<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why is data getting duplicated? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262850#M50453</link>
    <description>&lt;P&gt;No 2 stanzas are not pointing to the same source&lt;/P&gt;</description>
    <pubDate>Tue, 06 Dec 2016 04:16:22 GMT</pubDate>
    <dc:creator>puneethgowda</dc:creator>
    <dc:date>2016-12-06T04:16:22Z</dc:date>
    <item>
      <title>Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262845#M50448</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;

&lt;P&gt;We have noticed an issue in my Splunk environment:&lt;/P&gt;

&lt;P&gt;Issue:&lt;/P&gt;

&lt;P&gt;Data is getting duplicated twice in indexers. If i do a search in search head, the same events are coming in twice. this issue started today, earlier there is no issue with the data.&lt;/P&gt;

&lt;P&gt;My Investigations:&lt;/P&gt;

&lt;P&gt;1) Checked the application logs whether same log is existing twice. Answer: No&lt;BR /&gt;
2) Checked whether this issue is happening to one sourcetype OR only for one index.  Answer: No it is affecting all indexers data.&lt;/P&gt;

&lt;P&gt;My questions:&lt;/P&gt;

&lt;P&gt;Any other reason why this is happening? And what are the steps needed to prevent it?&lt;/P&gt;

&lt;P&gt;Thanks in advance.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
Puneeth&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 13:05:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262845#M50448</guid>
      <dc:creator>puneethgowda</dc:creator>
      <dc:date>2016-12-05T13:05:51Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262846#M50449</link>
      <description>&lt;P&gt;Did you check your inputs.conf if there are 2 stanzas pointing to the same source?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 13:29:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262846#M50449</guid>
      <dc:creator>PPape</dc:creator>
      <dc:date>2016-12-05T13:29:05Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262847#M50450</link>
      <description>&lt;P&gt;For your security, I removed your phone number from the question.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 13:39:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262847#M50450</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2016-12-05T13:39:49Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262848#M50451</link>
      <description>&lt;P&gt;I'm looking for a good best practices document about duplicate data... found this so far - &lt;A href="https://answers.splunk.com/answers/389806/what-are-best-practices-for-handling-data-in-a-spl.html"&gt;What are best practices for handling data in a Splunk staging environment that needs to go to production?&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 14:52:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262848#M50451</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2016-12-05T14:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262849#M50452</link>
      <description>&lt;P&gt;You have mentioned that all your data is getting duplicated, this sounds like a misconfigured outputs.conf &lt;BR /&gt;
Can you confirm how your outputs.conf is configured?&lt;/P&gt;

&lt;P&gt;Here's an example with 2 indexers which are in an indexer cluster named indexer 1 and 2, indexer acknowledgement is also turned on, SSL is not in use in this example:&lt;BR /&gt;
[tcpout]&lt;BR /&gt;
defaultGroup = allIndexers&lt;BR /&gt;
disabled = false&lt;/P&gt;

&lt;P&gt;[tcpout:allIndexers]&lt;BR /&gt;
server=indexer1:9997,indexer2:9997&lt;BR /&gt;
autoLB = true&lt;BR /&gt;
useACK = true&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 23:55:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262849#M50452</guid>
      <dc:creator>gjanders</dc:creator>
      <dc:date>2016-12-05T23:55:54Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262850#M50453</link>
      <description>&lt;P&gt;No 2 stanzas are not pointing to the same source&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2016 04:16:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262850#M50453</guid>
      <dc:creator>puneethgowda</dc:creator>
      <dc:date>2016-12-06T04:16:22Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262851#M50454</link>
      <description>&lt;P&gt;thanks you very much&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2016 04:16:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262851#M50454</guid>
      <dc:creator>puneethgowda</dc:creator>
      <dc:date>2016-12-06T04:16:53Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262852#M50455</link>
      <description>&lt;P&gt;#Version 6.5.1&lt;BR /&gt;#DO NOT EDIT THIS FILE! #Changes to default files will be lost on update and are difficult to&lt;BR /&gt;#manage and support.&lt;BR /&gt;#Please make any changes to system defaults by overriding them in&lt;BR /&gt;#apps or $SPLUNK_HOME/etc/system/local&lt;BR /&gt;#(See "Configuration file precedence" in the web documentation).&lt;BR /&gt;#To override a specific setting, copy the name of the stanza and #setting to the file where you wish to override it.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[tcpout]
maxQueueSize = auto
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)
forwardedindex.filter.disable = false
indexAndForward = false
autoLBFrequency = 30
blockOnCloning = true
compressed = false
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
heartbeatFrequency = 30
maxFailuresPerInterval = 2
secsInFailureInterval = 1
maxConnectionsPerIndexer = 2
forceTimebasedAutoLB = false
sendCookedData = true
connectionTimeout = 20
readTimeout = 300
writeTimeout = 300
tcpSendBufSz = 0
ackTimeoutOnShutdown = 30
useACK = false
blockWarnThreshold = 100
sslQuietShutdown = false

[syslog]
type = udp
priority = &amp;lt;13&amp;gt;
dropEventsOnQueueFull = -1
maxEventSize = 1024&lt;/LI-CODE&gt;</description>
      <pubDate>Wed, 10 Feb 2021 16:08:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262852#M50455</guid>
      <dc:creator>puneethgowda</dc:creator>
      <dc:date>2021-02-10T16:08:06Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262853#M50456</link>
      <description>&lt;P&gt;That is the outputs.conf from the default directory.&lt;BR /&gt;
Perhaps try:&lt;BR /&gt;
splunk btool outputs list --debug&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2016 04:22:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262853#M50456</guid>
      <dc:creator>gjanders</dc:creator>
      <dc:date>2016-12-06T04:22:48Z</dc:date>
    </item>
    <item>
      <title>Re: Why is data getting duplicated?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262854#M50457</link>
      <description>&lt;P&gt;&lt;STRONG&gt;In case of duplicate issues, we need to check the following:&lt;/STRONG&gt;&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Whether the source file contains duplicate events&lt;/LI&gt;
&lt;LI&gt;If mistakenly two inputs.conf are configured in splunk or two forwarders &lt;/LI&gt;
&lt;LI&gt;The original application may send the same data intentionally to two different channels (eg two files)&lt;/LI&gt;
&lt;LI&gt;Behavior where the forwarder is convinced to read a file multiple times, such as an explicit fishbucket reset, or incorrect use of CRCSalt'&lt;/LI&gt;
&lt;LI&gt;Monitoring the directory with symlink loops&lt;/LI&gt;
&lt;LI&gt;Use of the forwarding ACK system, where network failures are correctly intended to result in small amounts of duplicated data&lt;/LI&gt;
&lt;LI&gt;Use of summary indexing to intentionally duplicate events in splunk&lt;/LI&gt;
&lt;LI&gt;The original application may have a bug which produces the log duplication &lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;&lt;STRONG&gt;The following endpoint lists all files known to the tailing processor along with their status (read, ignored, blacklisted, etc...)&lt;/STRONG&gt;&lt;BR /&gt;
Link: https://[splunkd_hostname]:[splunkd_port]/services/admin/inputstatus/tailingprocessor:filestatus &lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;If you can not able to rectify the issue in the above scenarios, you can enable the DEBUG level using the following components.&lt;/STRONG&gt; &lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;TailingProcessor&lt;/LI&gt;
&lt;LI&gt;BatchReader&lt;/LI&gt;
&lt;LI&gt;WatchedFile&lt;/LI&gt;
&lt;LI&gt;FileTracker&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;&lt;STRONG&gt;To check if the events are duplicated, you can use follwoing SPL,&lt;/STRONG&gt;&lt;BR /&gt;
 | eval md=md5(_raw) | stats count by md | where count &amp;gt; 1&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;For more information, kindly check, community: Troubleshooting Monitor Inputs&lt;/STRONG&gt; &lt;BR /&gt;
Link: &lt;A href="https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs" target="_blank"&gt;https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 23:33:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-data-getting-duplicated/m-p/262854#M50457</guid>
      <dc:creator>dkolekar_splunk</dc:creator>
      <dc:date>2020-09-29T23:33:02Z</dc:date>
    </item>
  </channel>
</rss>

