Getting Data In

What are the advantages of using Hadoop Data Roll?

pradeepkumarg
Influencer

Documentation says

Archive indexer data to meet your data retention policies without using valuable indexer space.

How exactly does this help Splunk indexers? Does Hadoop has more compression rate?
Why would I want to store in Hadoop vs Splunk indexer itself ?

0 Karma
1 Solution

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

Here is a link to a good blog on the subject: http://blogs.splunk.com/2015/09/23/hunk-size-matters/

pradeepkumarg
Influencer

Thank you.

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

Get Updates on the Splunk Community!

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...