Deployment Architecture

Hadoop Data Roll and archiving replicated buckets

delappml_2
Explorer

I have an indexer cluster with a replication factor of 3. If I were to implement Hadoop Data Roll, would only one copy of each event be archived to Hadoop at freeze time, or would all three bucket copies be archived? I'm trying to find out if I can save in terms of raw archive storage costs by implementing HDR versus archiving frozen buckets to a set of NFS mounts.

1 Solution

rdagan_splunk
Splunk Employee
Splunk Employee

HDR will only copy 1 journal.gz (raw data from the bucket). Therefore, 3X Splunk bucket replication will not impact the storage on the Hadoop side.

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

HDR will only copy 1 journal.gz (raw data from the bucket). Therefore, 3X Splunk bucket replication will not impact the storage on the Hadoop side.

mattymo
Splunk Employee
Splunk Employee

HDR is by far the best archiving solution, unless you really want to write your own dedup logic (spoiler: you don't lol).

- MattyMo
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...