<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What causes these duplicate buckets (not marked duplicates)? in All Apps and Add-ons</title>
    <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244285#M73702</link>
    <description>&lt;P&gt;Unfortunately, it isn't clear what you are trying to do and how. Can you please elaborate? &lt;/P&gt;</description>
    <pubDate>Wed, 06 Jul 2016 00:42:36 GMT</pubDate>
    <dc:creator>ddrillic</dc:creator>
    <dc:date>2016-07-06T00:42:36Z</dc:date>
    <item>
      <title>What causes these duplicate buckets (not marked duplicates)?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244284#M73701</link>
      <description>&lt;P&gt;I'm trying to "freeze" these buckets, as described in &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Automatearchiving"&gt;Archive indexed data&lt;/A&gt;, and I'm seeing some confusing behavior from Splunk.&lt;/P&gt;

&lt;P&gt;By "Freezing" buckets I mean:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;the buckets are taken out of splunk and are no longer indexed&lt;/LI&gt;
&lt;LI&gt;the bucket data is stored on a remote file system&lt;/LI&gt;
&lt;LI&gt;the events in the buckets are not lost!&lt;/LI&gt;
&lt;LI&gt;the events in the buckets are not stored more than once&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I've set my frozen directory to &lt;CODE&gt;/misc/cloud2/splunk/freezer/accumulating/&lt;/CODE&gt;, and buckets are being moved there.  That is great!&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;However several buckets with the identical timestamp range are getting frozen there.  Why?&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;The problem is, the buckets shown, in a single index, represent the same time span.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;It seems very likely the data is being duplicated.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;But the newer buckets are always bigger.  &lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Questions&lt;/STRONG&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;How are they newer if they are the same time span?&lt;/LI&gt;
&lt;LI&gt;why is the same data being frozen more than once?&lt;/LI&gt;
&lt;LI&gt;where is the new data coming from if this is the frozen directory?&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&lt;STRONG&gt;More details&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Note the increasing sequence numbers are associated with increasing &lt;CODE&gt;journal.gz&lt;/CODE&gt; sizes.&lt;/P&gt;

&lt;P&gt;Here's a sample set of 3 duplicate timestamp ranges:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;splunk@nursery-splunkindex-1001:~$ ls -d /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399*
/misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1238
/misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1982
/misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_803
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here's the journals getting progressively larger:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;splunk@nursery-splunkindex-1001:~$ ls -l /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399*/rawdata/journal.gz
-rw------- 1 splunk splunk 1287 Apr  3 18:53 /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1238/rawdata/journal.gz
-rw------- 1 splunk splunk 1566 Apr  3 18:59 /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_1982/rawdata/journal.gz
-rw------- 1 splunk splunk  651 Apr  3 18:51 /misc/cloud2/splunk/freezer/accumulating/service_public/db_1456934399_1456930800_803/rawdata/journal.gz
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 06 Jul 2016 00:17:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244284#M73701</guid>
      <dc:creator>brianpreston</dc:creator>
      <dc:date>2016-07-06T00:17:18Z</dc:date>
    </item>
    <item>
      <title>Re: What causes these duplicate buckets (not marked duplicates)?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244285#M73702</link>
      <description>&lt;P&gt;Unfortunately, it isn't clear what you are trying to do and how. Can you please elaborate? &lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2016 00:42:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244285#M73702</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2016-07-06T00:42:36Z</dc:date>
    </item>
    <item>
      <title>Re: What causes these duplicate buckets (not marked duplicates)?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244286#M73703</link>
      <description>&lt;P&gt;Sure.  I'm trying to "freeze" these buckets, as described in &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Automatearchiving"&gt;Archive indexed data&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In short this means&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;the buckets are taken out of splunk and are no longer indexed&lt;/LI&gt;
&lt;LI&gt;the bucket data is stored on a remote file system&lt;/LI&gt;
&lt;LI&gt;the events in the buckets are not lost!&lt;/LI&gt;
&lt;LI&gt;the events in the buckets are not stored more than once&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;The problem is, the buckets shown, in a single index, represent the same time span.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;It seems very likely the data is being duplicated.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;But the newer buckets are always bigger.  &lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;How are they newer if they are the same time span?&lt;/LI&gt;
&lt;LI&gt;why is the same data being frozen more than once?&lt;/LI&gt;
&lt;LI&gt;where is the new data coming from if this is the frozen directory?&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 06 Jul 2016 06:54:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244286#M73703</guid>
      <dc:creator>brianpreston</dc:creator>
      <dc:date>2016-07-06T06:54:51Z</dc:date>
    </item>
    <item>
      <title>Re: What causes these duplicate buckets (not marked duplicates)?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244287#M73704</link>
      <description>&lt;P&gt;It was the Splunk App forAWS -- &lt;A href="http://docs.splunk.com/Documentation/AddOns/latest/AWS/Description"&gt;http://docs.splunk.com/Documentation/AddOns/latest/AWS/Description&lt;/A&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;What is happening here is the app is generating log events as the output of its analysis&lt;/LI&gt;
&lt;LI&gt;the log events are given a timestamp corresponding to the timestamp of the event in the AWS system&lt;/LI&gt;
&lt;LI&gt;the log event may be re-ingested!  And therefore collide with another bucket&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&lt;STRONG&gt;How to fix?&lt;/STRONG&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Fundamentally, do not put AWS or any other generated log events in the same index as your application data events&lt;/LI&gt;
&lt;LI&gt; secondly, do not try to freeze or archive events generated by an app / plugin&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 06 Jul 2016 22:32:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/What-causes-these-duplicate-buckets-not-marked-duplicates/m-p/244287#M73704</guid>
      <dc:creator>brianpreston</dc:creator>
      <dc:date>2016-07-06T22:32:06Z</dc:date>
    </item>
  </channel>
</rss>

