Solved: How do I go about removing excess bucket copies fr...

freaklin · ‎10-15-2018

Hi,

Currently, I keep 18 months of logs, and I'm spending a lot of resources in storage (AWS EBS), so until we find a solution to send and get frozen data using AWS S3, I decided to change the replication factor from 3 to 2, once an indexer failure is something really rare.

My issue is that I set this config and restarted my Cluster Master, as requested, but the available disk space on my indexer instances didn't change.

How can I purge unnecessary replied data?

richgalloway · ‎10-15-2018

As you've discovered, the excess buckets are not removed automatically. It's a manual process using the CLI.
See more in this related question https://answers.splunk.com/answers/369399/how-to-reduce-the-replication-factor-in-a-multisit.html

---
If this reply helps you, Karma would be appreciated.

View solution in original post

ddrillic · ‎10-16-2018

It's interesting to look at /opt/splunk/etc/system/default -

site_replication_factor = origin:2, total:3
site_search_factor = origin:1, total:2

The following says Configure the site replication factor

site_replication_factor = origin:<n>, [site1:<n>,] [site2:<n>,] ..., total:<n>

• n is a positive integer indicating the number of copies of a bucket.

• origin: specifies the minimum number of copies of a bucket that will be held on the site originating the data in that bucket (that is, the site where the data first entered the cluster). When a site is originating the data, it is known as the "origin" site.

• site1:, site2:, ..., indicates the minimum number of copies that will be held at each specified site. The identifiers "site1", "site2", and so on, are the same as the site attribute values specified on the peer nodes.

• total: specifies the total number of copies of each bucket, across all sites in the cluster.

So, by default the replication factor is 3 -- if I read it correctly ; -) many of us, along the way, decide to go lower...

freaklin · ‎10-17-2018

Am I wrong or $SPLUNK_HOME/etc/system/local/server.conf has preference over $SPLUNK_HOME/etc/system/default/server.conf.

In my local server.conf the repfactor is set as I want

[clustering]
cluster_label = master1
mode = master
pass4SymmKey = $1$RmUoN98$
replication_factor = 2
rebalance_threshold = 0.9
search_factor = 2
max_peer_build_load = 2
max_peer_rep_load = 20
max_peer_sum_rep_load = 20
maintenance_mode = false

When I posted this question, yesterday, my total indexed data was about 97TB, now is 82TB, so I think Splunk might have an auto purge, but It works without hurry.

Thanks anyway.

gjanders · ‎10-17-2018

Refer to Summary of directory precedence on the Configuration file precedence page or use btool, but yes system/local overrides system/default

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

richgalloway · ‎10-15-2018

As you've discovered, the excess buckets are not removed automatically. It's a manual process using the CLI.
See more in this related question https://answers.splunk.com/answers/369399/how-to-reduce-the-replication-factor-in-a-multisit.html

---
If this reply helps you, Karma would be appreciated.

gjanders · ‎10-16-2018

In addition to richgalloway's answer refer to Remove excess bucket copies from the indexer cluster in modern Splunk versions you can remove the buckets from the GUI

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

freaklin · ‎10-17-2018

btw, my splunkweb version is 7.1.2

freaklin · ‎10-17-2018

Thank you both, This Remove excess buckets is what I need, but I must say that I use to index around 300gb/day, my TTL is 18 months and I didn't expire any log since I started to use this TTL, So if my total indexed data decreased by more than 10% from yesterday to now, I can only believe that Splunk might have some worker who fits the buckets into the config.

How do I go about removing excess bucket copies from my index cluster?

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

New Dates, New City: Save the Date for .conf25!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud