Deployment Architecture

Backup Indexes to S3 without Hadoop

jbruce506
Explorer

Here's the situation - we have a non-developer, new to Splunk, without access to Hadoop (or any basic understanding of it) trying to backup indexed data to AWS S3. The documentation provides a lot of detail on how indexed data is stored but it doesn't give any definitive details on how to backup the data. There's a number of references to Hadoop using the Data Roll or Hunk, but we're not using Hadoop at all.

What would be the simplest way to 1) do daily incremental backups of the warm buckets to S3 and 2) archive frozen buckets to Glacier so no data is lost?

Tags (2)
0 Karma

jbruce506
Explorer

After re-reading the documentation several times over AND piecing together other info from Answers, I think this may be a slightly less complex than what I originally thought. From what I gather, you don't actually need a Hadoop cluster in place to implement Hadoop Data Roll. You need to install the Hadoop client version 2.6 or better and Java version 1.4 or better on the Splunk indexer/search head. Once installed, there are configuration options in the Splunk Web UI to setup index archiving with prewritten scripts that can backup to either a Hadoop cluster HDFS or Amazon S3 bucket, as seen here, https://docs.splunk.com/Documentation/Hunk/6.4.8/Hunk/ArchiveSplunkindexes.

0 Karma
Get Updates on the Splunk Community!

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

  Ready to master Kubernetes and cloud monitoring like the pros?Join Splunk’s Growth Engineering team for an ...

Wrapping Up Cybersecurity Awareness Month

October might be wrapping up, but for Splunk Education, cybersecurity awareness never goes out of season. ...

🌟 From Audit Chaos to Clarity: Welcoming Audit Trail v2

🗣 You Spoke, We Listened  Audit Trail v2 wasn’t written in isolation—it was shaped by your voices.  In ...