<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: If I delete data from HDFS will it impact my Splunk instance? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289556#M55249</link>
    <description>&lt;P&gt;Hey Scottgr!&lt;/P&gt;

&lt;P&gt;As far as I know Hunk/Data Roll will not action any retention policy on the HDFS side, so removing data with a script or policy on the Haddop side shouldn't bother Hunk. The virtual index simply tells Hunk where the data lives and how to send an MR job to the Hadoop side, and will return what it finds. &lt;/P&gt;

&lt;P&gt;You should be fine to manage the lifecycle of the data in HDFS in whatever manner works for you, in fact, it is something you will be required to do. &lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2017 13:45:05 GMT</pubDate>
    <dc:creator>mattymo</dc:creator>
    <dc:date>2017-08-16T13:45:05Z</dc:date>
    <item>
      <title>If I delete data from HDFS will it impact my Splunk instance?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289553#M55246</link>
      <description>&lt;P&gt;I'm storing log data in HDFS that is being indexed by Splunk.  Due to space constrains I'd like to delete data over a certain age.  I know that I can do this by editing indexes.conf but I wanted to see if there were any gotchas that I needed to be aware of.&lt;/P&gt;

&lt;P&gt;I'm specifically interested in knowing:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Will Splunk correctly delete the log data from HDFS if I tell it to delete data over a certain age?  i.e. is there anything specific I need to know about deletion of Splunk data from HDFS&lt;/LI&gt;
&lt;LI&gt;If instead of deleting the data from Splunk I used a script to automatically delete the files from HDFS would it cause problems with Splunk? (for example the index is expecting to see data that is now missing).  There might be some advantages to me deleting the data from HDFS directly rather than depending on Splunk to do it.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I'm quite new to working with Splunk as a developer so I'd be grateful for any advice people have with the above.  Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2017 13:34:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289553#M55246</guid>
      <dc:creator>scottgr</dc:creator>
      <dc:date>2017-08-15T13:34:37Z</dc:date>
    </item>
    <item>
      <title>Re: If I delete data from HDFS will it impact my Splunk instance?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289554#M55247</link>
      <description>&lt;P&gt;Hi scottgr!&lt;/P&gt;

&lt;P&gt;Can you tell us more about the Splunk configuration you are using to interact with HDFS? Hadoop Data Roll? Hunk? &lt;/P&gt;

&lt;P&gt;When I last played with data roll, Splunk didn't maintain the retention logic on the hdfs side. Once it was there, it was there and we only aged out data in your indexes. I recall it was relatively easy to use the hdfs command to clean up if needed. &lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 02:40:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289554#M55247</guid>
      <dc:creator>mattymo</dc:creator>
      <dc:date>2017-08-16T02:40:49Z</dc:date>
    </item>
    <item>
      <title>Re: If I delete data from HDFS will it impact my Splunk instance?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289555#M55248</link>
      <description>&lt;P&gt;Thanks for getting back to me - it's much appreciated.  We're using Hunk.  So we have a remote Hadoop cluster that we're accessing in Splunk via a virtual index.&lt;/P&gt;

&lt;P&gt;What I'm unsure about is whether it's acceptable to just delete old data within Hadoop - or whether that will cause problems with the Splunk indexing.&lt;/P&gt;

&lt;P&gt;Thanks again for the help.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 13:23:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289555#M55248</guid>
      <dc:creator>scottgr</dc:creator>
      <dc:date>2017-08-16T13:23:57Z</dc:date>
    </item>
    <item>
      <title>Re: If I delete data from HDFS will it impact my Splunk instance?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289556#M55249</link>
      <description>&lt;P&gt;Hey Scottgr!&lt;/P&gt;

&lt;P&gt;As far as I know Hunk/Data Roll will not action any retention policy on the HDFS side, so removing data with a script or policy on the Haddop side shouldn't bother Hunk. The virtual index simply tells Hunk where the data lives and how to send an MR job to the Hadoop side, and will return what it finds. &lt;/P&gt;

&lt;P&gt;You should be fine to manage the lifecycle of the data in HDFS in whatever manner works for you, in fact, it is something you will be required to do. &lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 13:45:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289556#M55249</guid>
      <dc:creator>mattymo</dc:creator>
      <dc:date>2017-08-16T13:45:05Z</dc:date>
    </item>
    <item>
      <title>Re: If I delete data from HDFS will it impact my Splunk instance?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289557#M55250</link>
      <description>&lt;P&gt;Right, that's the key thing - it's &lt;STRONG&gt;virtual&lt;/STRONG&gt; - "just" a pointer to the location. You can administer the data on HDFS as you please...&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 13:53:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289557#M55250</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2017-08-16T13:53:42Z</dc:date>
    </item>
    <item>
      <title>Re: If I delete data from HDFS will it impact my Splunk instance?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289558#M55251</link>
      <description>&lt;P&gt;That's great - really appreciate the advice.  Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2017 09:49:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/If-I-delete-data-from-HDFS-will-it-impact-my-Splunk-instance/m-p/289558#M55251</guid>
      <dc:creator>scottgr</dc:creator>
      <dc:date>2017-08-17T09:49:56Z</dc:date>
    </item>
  </channel>
</rss>

