Deployment Architecture

Backup Indexes to S3 without Hadoop

jbruce506
Explorer

Here's the situation - we have a non-developer, new to Splunk, without access to Hadoop (or any basic understanding of it) trying to backup indexed data to AWS S3. The documentation provides a lot of detail on how indexed data is stored but it doesn't give any definitive details on how to backup the data. There's a number of references to Hadoop using the Data Roll or Hunk, but we're not using Hadoop at all.

What would be the simplest way to 1) do daily incremental backups of the warm buckets to S3 and 2) archive frozen buckets to Glacier so no data is lost?

Tags (2)
0 Karma

jbruce506
Explorer

After re-reading the documentation several times over AND piecing together other info from Answers, I think this may be a slightly less complex than what I originally thought. From what I gather, you don't actually need a Hadoop cluster in place to implement Hadoop Data Roll. You need to install the Hadoop client version 2.6 or better and Java version 1.4 or better on the Splunk indexer/search head. Once installed, there are configuration options in the Splunk Web UI to setup index archiving with prewritten scripts that can backup to either a Hadoop cluster HDFS or Amazon S3 bucket, as seen here, https://docs.splunk.com/Documentation/Hunk/6.4.8/Hunk/ArchiveSplunkindexes.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...