<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How are identical files from multiple (clustered) systems handled? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396588#M70682</link>
    <description>&lt;P&gt;The tracking of duplicate input files is done by the individual forwarders.  Since each forwarder does not know what other forwarders have processed, you will get duplicates.&lt;/P&gt;</description>
    <pubDate>Mon, 15 Jul 2019 12:39:02 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2019-07-15T12:39:02Z</dc:date>
    <item>
      <title>How are identical files from multiple (clustered) systems handled?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396587#M70681</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;
I have an application that logs to a shared clustered file system.&lt;BR /&gt;
What happens when I install the fowarder (via deployment server and identical configuation) on on each of the nodes to monitor the logs on the this file system?&lt;BR /&gt;
Do I get duplicates for each of the hosts or can splunk identify that they are dupes even though they come from different hosts?&lt;BR /&gt;
Would crcsalt help here?&lt;BR /&gt;
thx&lt;BR /&gt;
afx&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2019 09:49:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396587#M70681</guid>
      <dc:creator>afx</dc:creator>
      <dc:date>2019-07-15T09:49:47Z</dc:date>
    </item>
    <item>
      <title>Re: How are identical files from multiple (clustered) systems handled?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396588#M70682</link>
      <description>&lt;P&gt;The tracking of duplicate input files is done by the individual forwarders.  Since each forwarder does not know what other forwarders have processed, you will get duplicates.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2019 12:39:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396588#M70682</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2019-07-15T12:39:02Z</dc:date>
    </item>
    <item>
      <title>Re: How are identical files from multiple (clustered) systems handled?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396589#M70683</link>
      <description>&lt;P&gt;Drat...&lt;BR /&gt;
Two ideas:&lt;BR /&gt;
1: Forcing an identical hostname, would that help the indexer to identify incoming dupes?&lt;BR /&gt;
2: Using a heavy forwarder inbetween to filter out dupes.&lt;BR /&gt;
I really want to avoid #2, that would mean I either add additional burden to a box or need a new box.&lt;BR /&gt;
thx&lt;BR /&gt;
afx&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2019 12:49:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396589#M70683</guid>
      <dc:creator>afx</dc:creator>
      <dc:date>2019-07-15T12:49:01Z</dc:date>
    </item>
    <item>
      <title>Re: How are identical files from multiple (clustered) systems handled?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396590#M70684</link>
      <description>&lt;OL&gt;
&lt;LI&gt; Indexers do not identify dupes.  You can do that at search time, however.&lt;/LI&gt;
&lt;LI&gt;An intermediate HF could probably do the time, but it would be a bottleneck and would impair performance.  Splunk advises against intermediate forwarders unless absolutely necessary.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;What you really should do is avoid having more than one forwarder read a given file.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2019 15:24:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396590#M70684</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2019-07-15T15:24:07Z</dc:date>
    </item>
    <item>
      <title>Re: How are identical files from multiple (clustered) systems handled?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396591#M70685</link>
      <description>&lt;P&gt;Yup, avaoiding that would be best. I am currently trying to figure out whether the forwarder can be startet / stopped with the application, so there might be some minimal overlap, but overall only one of them is active.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2019 15:37:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-are-identical-files-from-multiple-clustered-systems-handled/m-p/396591#M70685</guid>
      <dc:creator>afx</dc:creator>
      <dc:date>2019-07-15T15:37:17Z</dc:date>
    </item>
  </channel>
</rss>

