Deployment Architecture

Backup Indexes to S3 without Hadoop


Here's the situation - we have a non-developer, new to Splunk, without access to Hadoop (or any basic understanding of it) trying to backup indexed data to AWS S3. The documentation provides a lot of detail on how indexed data is stored but it doesn't give any definitive details on how to backup the data. There's a number of references to Hadoop using the Data Roll or Hunk, but we're not using Hadoop at all.

What would be the simplest way to 1) do daily incremental backups of the warm buckets to S3 and 2) archive frozen buckets to Glacier so no data is lost?

Tags (2)
0 Karma


After re-reading the documentation several times over AND piecing together other info from Answers, I think this may be a slightly less complex than what I originally thought. From what I gather, you don't actually need a Hadoop cluster in place to implement Hadoop Data Roll. You need to install the Hadoop client version 2.6 or better and Java version 1.4 or better on the Splunk indexer/search head. Once installed, there are configuration options in the Splunk Web UI to setup index archiving with prewritten scripts that can backup to either a Hadoop cluster HDFS or Amazon S3 bucket, as seen here,

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!