Solved: What are the advantages of using Hadoop Data Roll?

pradeepkumarg · ‎02-09-2017

Documentation says

Archive indexer data to meet your data retention policies without using valuable indexer space.

How exactly does this help Splunk indexers? Does Hadoop has more compression rate?
Why would I want to store in Hadoop vs Splunk indexer itself ?

hsesterhenn_spl · ‎02-09-2017

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

View solution in original post

rdagan_splunk · ‎02-09-2017

Here is a link to a good blog on the subject: http://blogs.splunk.com/2015/09/23/hunk-size-matters/

pradeepkumarg · ‎02-09-2017

Thank you.

hsesterhenn_spl · ‎02-09-2017