Deployment Architecture

How do I go about removing excess bucket copies from my index cluster?

Path Finder

Hi,

Currently, I keep 18 months of logs, and I'm spending a lot of resources in storage (AWS EBS), so until we find a solution to send and get frozen data using AWS S3, I decided to change the replication factor from 3 to 2, once an indexer failure is something really rare.

My issue is that I set this config and restarted my Cluster Master, as requested, but the available disk space on my indexer instances didn't change.

How can I purge unnecessary replied data?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

As you've discovered, the excess buckets are not removed automatically. It's a manual process using the CLI.
See more in this related question https://answers.splunk.com/answers/369399/how-to-reduce-the-replication-factor-in-a-multisit.html

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

Ultra Champion

It's interesting to look at /opt/splunk/etc/system/default -

site_replication_factor = origin:2, total:3
site_search_factor = origin:1, total:2

The following says Configure the site replication factor

site_replication_factor = origin:<n>, [site1:<n>,] [site2:<n>,] ..., total:<n>

• n is a positive integer indicating the number of copies of a bucket.

• origin: specifies the minimum number of copies of a bucket that will be held on the site originating the data in that bucket (that is, the site where the data first entered the cluster). When a site is originating the data, it is known as the "origin" site.

• site1:, site2:, ..., indicates the minimum number of copies that will be held at each specified site. The identifiers "site1", "site2", and so on, are the same as the site attribute values specified on the peer nodes.

• total: specifies the total number of copies of each bucket, across all sites in the cluster.

So, by default the replication factor is 3 -- if I read it correctly ; -) many of us, along the way, decide to go lower...

0 Karma

Path Finder

Am I wrong or $SPLUNKHOME/etc/system/local/server.conf has preference over $SPLUNKHOME/etc/system/default/server.conf.

In my local server.conf the repfactor is set as I want

[clustering]
clusterlabel = master1
mode = master
pass4SymmKey = $1$RmUoN98$
replication
factor = 2
rebalancethreshold = 0.9
search
factor = 2
maxpeerbuildload = 2
max
peerrepload = 20
maxpeersumrepload = 20
maintenance_mode = false

When I posted this question, yesterday, my total indexed data was about 97TB, now is 82TB, so I think Splunk might have an auto purge, but It works without hurry.

Thanks anyway.

0 Karma

SplunkTrust
SplunkTrust

Refer to Summary of directory precedence on the Configuration file precedence page or use btool, but yes system/local overrides system/default

0 Karma

SplunkTrust
SplunkTrust

As you've discovered, the excess buckets are not removed automatically. It's a manual process using the CLI.
See more in this related question https://answers.splunk.com/answers/369399/how-to-reduce-the-replication-factor-in-a-multisit.html

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

SplunkTrust
SplunkTrust

In addition to richgalloway's answer refer to Remove excess bucket copies from the indexer cluster in modern Splunk versions you can remove the buckets from the GUI

0 Karma

Path Finder

btw, my splunkweb version is 7.1.2

0 Karma

Path Finder

Thank you both, This Remove excess buckets is what I need, but I must say that I use to index around 300gb/day, my TTL is 18 months and I didn't expire any log since I started to use this TTL, So if my total indexed data decreased by more than 10% from yesterday to now, I can only believe that Splunk might have some worker who fits the buckets into the config.

0 Karma