Getting Data In

What are the advantages of using Hadoop Data Roll?

pradeepkumarg
Influencer

Documentation says

Archive indexer data to meet your data retention policies without using valuable indexer space.

How exactly does this help Splunk indexers? Does Hadoop has more compression rate?
Why would I want to store in Hadoop vs Splunk indexer itself ?

0 Karma
1 Solution

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

Here is a link to a good blog on the subject: http://blogs.splunk.com/2015/09/23/hunk-size-matters/

pradeepkumarg
Influencer

Thank you.

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...