Deployment Architecture

Backup Indexes to S3 without Hadoop

jbruce506
Explorer

Here's the situation - we have a non-developer, new to Splunk, without access to Hadoop (or any basic understanding of it) trying to backup indexed data to AWS S3. The documentation provides a lot of detail on how indexed data is stored but it doesn't give any definitive details on how to backup the data. There's a number of references to Hadoop using the Data Roll or Hunk, but we're not using Hadoop at all.

What would be the simplest way to 1) do daily incremental backups of the warm buckets to S3 and 2) archive frozen buckets to Glacier so no data is lost?

Tags (2)
0 Karma

jbruce506
Explorer

After re-reading the documentation several times over AND piecing together other info from Answers, I think this may be a slightly less complex than what I originally thought. From what I gather, you don't actually need a Hadoop cluster in place to implement Hadoop Data Roll. You need to install the Hadoop client version 2.6 or better and Java version 1.4 or better on the Splunk indexer/search head. Once installed, there are configuration options in the Splunk Web UI to setup index archiving with prewritten scripts that can backup to either a Hadoop cluster HDFS or Amazon S3 bucket, as seen here, https://docs.splunk.com/Documentation/Hunk/6.4.8/Hunk/ArchiveSplunkindexes.

0 Karma
Get Updates on the Splunk Community!

Happy CX Day to our Community Superheroes!

Happy 10th Birthday CX Day!What is CX Day? It’s a global celebration recognizing innovation and success in the ...

Check out This Month’s Brand new Splunk Lantern Articles

Splunk Lantern is a customer success center providing advice from Splunk experts on valuable data insights, ...

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...