<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: difference between index archiving (cold to Frozen) and archive to Hadoop via Hadoop Data Roll? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362562#M66083</link>
    <description>&lt;P&gt;A)&lt;BR /&gt;
Here is the link to configure your Provider with Kerberos: &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/HadoopAnalytics/ConfigureKerberosauthentication" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/HadoopAnalytics/ConfigureKerberosauthentication&lt;/A&gt;&lt;BR /&gt;
Also, make sure the Kerberos keytab, Hadoop Home, and Java home are exactly on the same location in both the Search Head and all the Indexers. Otherwise you might see this error: &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Troubleshoot" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Troubleshoot&lt;/A&gt;&lt;BR /&gt;
Hadoop version 2.6 should work without any issues. &lt;/P&gt;

&lt;P&gt;B) &lt;BR /&gt;
The script $SPLUNK_HOME/etc/apps/splunk_archiver/bin/coldToFrozen.sh is not needed.  Consider the coldToFrozen.sh Script as a fallback and not your primary hook for archiving. This script buys you more time when either your system is receiving data faster than normal, or when the archiving storage layer is down, so that you'll have more time to archive bucket. To facilitate this further, for each archive index you can set your vix.output.buckets.older.than = seconds as low as possible, so that buckets are archived as quickly as possible.&lt;BR /&gt;
So if for example you change your settings from vix.output.buckets.older.than=60 (days) to vix.output.buckets.older.than=50 (days) you should not have any need for that script. &lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 16:49:30 GMT</pubDate>
    <dc:creator>rdagan_splunk</dc:creator>
    <dc:date>2020-09-29T16:49:30Z</dc:date>
    <item>
      <title>difference between index archiving (cold to Frozen) and archive to Hadoop via Hadoop Data Roll?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362559#M66080</link>
      <description>&lt;P&gt;I read  splunk docs and understood the below:&lt;BR /&gt;
Splunk Index archiving  from cold to frozen to a particular location can be done either via&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Automatically by splunk Indexer&lt;/LI&gt;
&lt;LI&gt;Or by using coldToFrozenscript.py&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;This will archive data to a particular directory that we mention in indexes.conf.&lt;BR /&gt;
However it faces problems in cases of clustered architecture due to same multiple buckets being copied as per below doc.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Automatearchiving" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Automatearchiving&lt;/A&gt;&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;Im planning to use Hadoop Data Roll to send the splunk index data to Hadoop for longer Retention wherein it CAN be further processed/analysed using hadoop technologies(hive,pig, etc)&lt;/P&gt;

&lt;P&gt;A) Now my Qs is - I configure the splunk index archiving to Hadoop by following the below steps i.e creating a Hadoop provider and updating indexes.conf with the below details and as per below doc?&lt;/P&gt;

&lt;P&gt;[splunk_index_archive]&lt;BR /&gt;
vix.output.buckets.from.indexes&lt;BR /&gt;
vix.output.buckets.older.than&lt;BR /&gt;
vix.output.buckets.path&lt;BR /&gt;
vix.provider &lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/ConfigureSplunkindexarchivingtoHadoop" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/ConfigureSplunkindexarchivingtoHadoop&lt;/A&gt; &lt;/P&gt;

&lt;P&gt;B) Does Hadoop Data Roll require the coldToFrozenExample.py script to send data to Hadoop?&lt;BR /&gt;
C) Does Hadoop Data Roll tackle the multiple copies issue?&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;D) Can someone kindly help what does this doc refer?&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/SetanarchivescripttoHadoop" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/SetanarchivescripttoHadoop&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Whats the difference between the steps in my above Qs  in A) &lt;BR /&gt;
and using this script for data transfer to Hadoop? &lt;BR /&gt;
OR&lt;BR /&gt;
Is it mandatory to use this script for Index transfer to Hadoop?&lt;/P&gt;

&lt;P&gt;I'm really confused, Kindly Help.....&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:43:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362559#M66080</guid>
      <dc:creator>Harishma</dc:creator>
      <dc:date>2020-09-29T16:43:55Z</dc:date>
    </item>
    <item>
      <title>Re: difference between index archiving (cold to Frozen) and archive to Hadoop via Hadoop Data Roll?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362560#M66081</link>
      <description>&lt;P&gt;A) Exactly. You setup a Provider and VIX to archive the data.  Also, you will need to install Hadoop and Java on all of your Search Heads and Indexers.  The actual copy for the buckets is done from the Indexers.&lt;BR /&gt;
B) Hadoop Data Roll does not need the script coldToFrozenExample.py. &lt;BR /&gt;
C) Yes Hadoop Data Roll will only copy 1 bucket. the other copies will not be moved to HDFS. &lt;BR /&gt;
D) Hadoop Data Roll does not need a script, it just use the flag vix.output.buckets.older.than = seconds to determine if the bucket has to be copied or not.&lt;BR /&gt;&lt;BR /&gt;
"$SPLUNK_HOME/etc/apps/splunk_archiver/bin/coldToFrozen.sh" is used just to prevent buckets from being deleted by Splunk before you copy these buckets to HDFS.&lt;BR /&gt;
Do not confuse this script with this non-HDFS script: $SPLUNK_HOME/bin/coldToFrozenExample.py&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:44:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362560#M66081</guid>
      <dc:creator>rdagan_splunk</dc:creator>
      <dc:date>2020-09-29T16:44:30Z</dc:date>
    </item>
    <item>
      <title>Re: difference between index archiving (cold to Frozen) and archive to Hadoop via Hadoop Data Roll?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362561#M66082</link>
      <description>&lt;P&gt;Hi  @rdagan ,&lt;/P&gt;

&lt;P&gt;Thankyou so much for your response. But few queries to hlep me understand better.&lt;/P&gt;

&lt;P&gt;A) &lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;My Hadoop cluster is kerberos authenticated. So I need to follow these steps to install kerberos utilities on splunk servers as well right? coz this is not mentioned in the hadoop data roll  system requirements doc
&lt;A href="https://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Kerberosclientutilities"&gt;https://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Kerberosclientutilities&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Also the CDH Version is  Hadoop 2.6.0-cdh5.9.1 Is this supported? I see only  upto  CDH 5.6  mentioned in the system requirements.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;D)  Our retention policy is 62 days, so If I mention &lt;BR /&gt;
vix.output.buckets.older.than=60 , when the data becomes 60 day old it will get copied to HDFS. So when parameter can handle this , why is there a necessity for this coldToFrozenSh.script &amp;gt;?&lt;BR /&gt;
Does it mean copying will take time?&lt;BR /&gt;
Is it better/advisable to have this script also in place?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Nov 2017 12:43:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362561#M66082</guid>
      <dc:creator>Harishma</dc:creator>
      <dc:date>2017-11-16T12:43:24Z</dc:date>
    </item>
    <item>
      <title>Re: difference between index archiving (cold to Frozen) and archive to Hadoop via Hadoop Data Roll?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362562#M66083</link>
      <description>&lt;P&gt;A)&lt;BR /&gt;
Here is the link to configure your Provider with Kerberos: &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/HadoopAnalytics/ConfigureKerberosauthentication" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/HadoopAnalytics/ConfigureKerberosauthentication&lt;/A&gt;&lt;BR /&gt;
Also, make sure the Kerberos keytab, Hadoop Home, and Java home are exactly on the same location in both the Search Head and all the Indexers. Otherwise you might see this error: &lt;A href="https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Troubleshoot" target="_blank"&gt;https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Troubleshoot&lt;/A&gt;&lt;BR /&gt;
Hadoop version 2.6 should work without any issues. &lt;/P&gt;

&lt;P&gt;B) &lt;BR /&gt;
The script $SPLUNK_HOME/etc/apps/splunk_archiver/bin/coldToFrozen.sh is not needed.  Consider the coldToFrozen.sh Script as a fallback and not your primary hook for archiving. This script buys you more time when either your system is receiving data faster than normal, or when the archiving storage layer is down, so that you'll have more time to archive bucket. To facilitate this further, for each archive index you can set your vix.output.buckets.older.than = seconds as low as possible, so that buckets are archived as quickly as possible.&lt;BR /&gt;
So if for example you change your settings from vix.output.buckets.older.than=60 (days) to vix.output.buckets.older.than=50 (days) you should not have any need for that script. &lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:49:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362562#M66083</guid>
      <dc:creator>rdagan_splunk</dc:creator>
      <dc:date>2020-09-29T16:49:30Z</dc:date>
    </item>
    <item>
      <title>Re: difference between index archiving (cold to Frozen) and archive to Hadoop via Hadoop Data Roll?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362563#M66084</link>
      <description>&lt;P&gt;Hi @rdagan ,&lt;/P&gt;

&lt;P&gt;Thankyou much for your assistance &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Nov 2017 08:51:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/difference-between-index-archiving-cold-to-Frozen-and-archive-to/m-p/362563#M66084</guid>
      <dc:creator>Harishma</dc:creator>
      <dc:date>2017-11-17T08:51:11Z</dc:date>
    </item>
  </channel>
</rss>

