Getting Data In

What are the advantages of using Hadoop Data Roll?

pradeepkumarg
Influencer

Documentation says

Archive indexer data to meet your data retention policies without using valuable indexer space.

How exactly does this help Splunk indexers? Does Hadoop has more compression rate?
Why would I want to store in Hadoop vs Splunk indexer itself ?

0 Karma
1 Solution

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

Here is a link to a good blog on the subject: http://blogs.splunk.com/2015/09/23/hunk-size-matters/

pradeepkumarg
Influencer

Thank you.

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

with Hadoop Data Roll you are much more flexible with your retention policies.

If you move your Splunk Enterprise data on the Indexers from cold to frozen, you'll be able to archive the data and the index data (TSIDX files) will be removed. Searching is only available if you restore the data and re-index it. This takes time.

With HDR (Hadoop Data Roll), you can copy/archive the data to Hadoop and the data will still be searchable.

By doing this, you might shorten your Splunk Enterprise retention period, reducing the amount of storage necessary for Splunk Enterprise Indexers.

Example: Your Indexers can only store data for 30 days because you are out of disk space.
Archiving data to Hadoop you might store data for more than a year! Raw data compressed, not the index data.

If you have use cases for longtime analytic this might help you reducing the TCO of your Splunk environment.

And you don't need an additional license anymore. It's working with the Splunk Enterprise Core volume license.

Caution: If your use case is rare time searches (like the one event out of 100 million with the id 123456787556) searching in Hadoop might not be fast enough.

By removing the TSIDX data you can't leverage the index anymore.

HTH,

Holger

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...