Deployment Architecture

How do I go about removing excess bucket copies from my index cluster?

freaklin
Path Finder

Hi,

Currently, I keep 18 months of logs, and I'm spending a lot of resources in storage (AWS EBS), so until we find a solution to send and get frozen data using AWS S3, I decided to change the replication factor from 3 to 2, once an indexer failure is something really rare.

My issue is that I set this config and restarted my Cluster Master, as requested, but the available disk space on my indexer instances didn't change.

How can I purge unnecessary replied data?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

As you've discovered, the excess buckets are not removed automatically. It's a manual process using the CLI.
See more in this related question https://answers.splunk.com/answers/369399/how-to-reduce-the-replication-factor-in-a-multisit.html

---
If this reply helps you, Karma would be appreciated.

View solution in original post

ddrillic
Ultra Champion

It's interesting to look at /opt/splunk/etc/system/default -

site_replication_factor = origin:2, total:3
site_search_factor = origin:1, total:2

The following says Configure the site replication factor

site_replication_factor = origin:<n>, [site1:<n>,] [site2:<n>,] ..., total:<n>

• n is a positive integer indicating the number of copies of a bucket.

• origin: specifies the minimum number of copies of a bucket that will be held on the site originating the data in that bucket (that is, the site where the data first entered the cluster). When a site is originating the data, it is known as the "origin" site.

• site1:, site2:, ..., indicates the minimum number of copies that will be held at each specified site. The identifiers "site1", "site2", and so on, are the same as the site attribute values specified on the peer nodes.

• total: specifies the total number of copies of each bucket, across all sites in the cluster.

So, by default the replication factor is 3 -- if I read it correctly ; -) many of us, along the way, decide to go lower...

0 Karma

freaklin
Path Finder

Am I wrong or $SPLUNK_HOME/etc/system/local/server.conf has preference over $SPLUNK_HOME/etc/system/default/server.conf.

In my local server.conf the repfactor is set as I want

[clustering]
cluster_label = master1
mode = master
pass4SymmKey = $1$RmUoN98$
replication_factor = 2
rebalance_threshold = 0.9
search_factor = 2
max_peer_build_load = 2
max_peer_rep_load = 20
max_peer_sum_rep_load = 20
maintenance_mode = false

When I posted this question, yesterday, my total indexed data was about 97TB, now is 82TB, so I think Splunk might have an auto purge, but It works without hurry.

Thanks anyway.

0 Karma

gjanders
SplunkTrust
SplunkTrust
0 Karma

richgalloway
SplunkTrust
SplunkTrust

As you've discovered, the excess buckets are not removed automatically. It's a manual process using the CLI.
See more in this related question https://answers.splunk.com/answers/369399/how-to-reduce-the-replication-factor-in-a-multisit.html

---
If this reply helps you, Karma would be appreciated.

gjanders
SplunkTrust
SplunkTrust

In addition to richgalloway's answer refer to Remove excess bucket copies from the indexer cluster in modern Splunk versions you can remove the buckets from the GUI

0 Karma

freaklin
Path Finder

btw, my splunkweb version is 7.1.2

0 Karma

freaklin
Path Finder

Thank you both, This Remove excess buckets is what I need, but I must say that I use to index around 300gb/day, my TTL is 18 months and I didn't expire any log since I started to use this TTL, So if my total indexed data decreased by more than 10% from yesterday to now, I can only believe that Splunk might have some worker who fits the buckets into the config.

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...