<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Detection of duplicate files in batch mode in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623029#M107221</link>
    <description>&lt;P&gt;Hehe &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;Alright, thanks for your input!&lt;/P&gt;</description>
    <pubDate>Fri, 02 Dec 2022 13:40:32 GMT</pubDate>
    <dc:creator>zapping575</dc:creator>
    <dc:date>2022-12-02T13:40:32Z</dc:date>
    <item>
      <title>Detection of duplicate files in batch mode</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623013#M107216</link>
      <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;I have the use case that my splunk universal forwarder does not continuously monitor my logs.&lt;/P&gt;&lt;P&gt;Because of this nature, I am using batch mode to have the files deleted after ingestion.&lt;/P&gt;&lt;P&gt;Now, I occasionally receive log files which I have already received at an earlier point in time.&lt;/P&gt;&lt;P&gt;Problem is: The features crcSalt, initCrcLength etc. are only available in monitor mode. This means that I am not able to benefit from splunks features to prevent duplicate ingestion of the same data.&lt;/P&gt;&lt;P&gt;Any help on a solution for this is greatly appreciated.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 11:41:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623013#M107216</guid>
      <dc:creator>zapping575</dc:creator>
      <dc:date>2022-12-02T11:41:42Z</dc:date>
    </item>
    <item>
      <title>Re: Detection of duplicate files in batch mode</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623024#M107218</link>
      <description>&lt;P&gt;I'd try writing some external "helper" script which keeps track of files.&lt;/P&gt;&lt;P&gt;But the question is why don't you use monitor input? Unless you absolutely need the sinkholing functionality and can't get around it another way (like logrotate or such).&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 12:55:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623024#M107218</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2022-12-02T12:55:04Z</dc:date>
    </item>
    <item>
      <title>Re: Detection of duplicate files in batch mode</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623025#M107219</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/231884"&gt;@PickleRick&lt;/a&gt;&lt;/P&gt;&lt;P&gt;Thanks for your reply.&lt;/P&gt;&lt;P&gt;I think also that keeping track of files is something that I will have to implement myself.&lt;/P&gt;&lt;P&gt;I was just hoping to be able to use what splunk has.&lt;/P&gt;&lt;P&gt;On why not using monitor inputs:&lt;/P&gt;&lt;P&gt;I have a http endpoint that receives log files from another system and extracts them to disk, where the forwarder then picks them up. I could use monitor mode, but because there is no log rotation or similar it will ultimately result in filling up the disk.&lt;BR /&gt;What is charming about batch mode is that it ensures that the disk space is free again after a new file has been completely ingested.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 13:16:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623025#M107219</guid>
      <dc:creator>zapping575</dc:creator>
      <dc:date>2022-12-02T13:16:03Z</dc:date>
    </item>
    <item>
      <title>Re: Detection of duplicate files in batch mode</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623028#M107220</link>
      <description>&lt;P&gt;Yeah, but you end up with duplicates &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;If you can assure that there is a maximum possible period for duplicate creation, you could get away with monitor input and external script to clean up the directory of files older than given time. - that could be an alternative approach (probably easier to implement).&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 13:37:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623028#M107220</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2022-12-02T13:37:15Z</dc:date>
    </item>
    <item>
      <title>Re: Detection of duplicate files in batch mode</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623029#M107221</link>
      <description>&lt;P&gt;Hehe &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;Alright, thanks for your input!&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 13:40:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Detection-of-duplicate-files-in-batch-mode/m-p/623029#M107221</guid>
      <dc:creator>zapping575</dc:creator>
      <dc:date>2022-12-02T13:40:32Z</dc:date>
    </item>
  </channel>
</rss>

