Getting Data In

What are the advantages of using Hadoop Data Roll?

pradeepkumarg
Influencer

Documentation says

Archive indexer data to meet your data retention policies without using valuable indexer space.

How exactly does this help Splunk indexers? Does Hadoop has more compression rate?
Why would I want to store in Hadoop vs Splunk indexer itself ?

0 Karma
1 Solution

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

Here is a link to a good blog on the subject: http://blogs.splunk.com/2015/09/23/hunk-size-matters/

pradeepkumarg
Influencer

Thank you.

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...