Splunk Enterprise

Can a frozen bucket be an excess bucket?

Shashwat
Explorer

Hi there,

Can a frozen bucket be an excess bucket ?

Additional Context: Multisite cluster, Splunk enterprise V8.1.5

Regards,
Shashwat

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I don't think so. If a bucket gets frozen, it's getting rolled out - either deleted, moved to a frozen storage or processed by a configured coldToFrozenScript and Splunk stops tracking it completely - it "forgets" about its existence untill you thaw the bucket manually.

So the bucket is not accounted for and cannot IMO be counted as excess bucket.

0 Karma

Shashwat
Explorer

Hi PickelRick,

Thank you for the insight.

I have a scenario here. Lets say the bucket got frozen on one peer (peer1) due to index size reached its maximum size.
But then, on other peers in the cluster, it might not be the case( index is yet to reach its max size). And now peer1 went down.
Once it rejoins the cluster, the frozen bucket may become excess bucket. 
Please let me know your thoughts.

Regards,
Shashwat

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No. As the bucket got frozen, it got "expelled" from the index. It is no more so it will not be an excess bucket.

Buckets are frozen independently on each indexer and there are no fixups other than reassigning primaries on frozen buckets. So they will freeze one by one untill no copy is left in the cluster.

Quoting the docs:

In the case of an indexer cluster, when a peer freezes a copy of a bucket,
it notifies the manager. The manager then stops doing fix-ups on that bucket.
It operates under the assumption that the other peers will eventually freeze
their copies of that bucket as well.

I'm not entirely sure however what would happen in a different scenario:

You have a 5-indexer cluster. Bucket was initially replicated to indexers 1,2 and 3. It got rolled to frozen from indexers 1 and 2 due to index or volume size. So there is only a single copy remaining on indexer 3.

Normally it would wait there to get frozen but let's assume that we lose the cluster manager for a while. When it rejoins the cluster it "inventories" the indexers (MC doesn't store the cluster state in a  persistent way). So it sees that it has just one copy of a bucket which should have more copies.

I suppose it could get replicated again but I haven't checked it.

Anyway, it's a completely different scenario from yours 🙂

0 Karma

isoutamo
SplunkTrust
SplunkTrust

That workflow is quite obvious, but what will happen if those buckets have excess copies before 1st instance will frozen those? I still suppose that those excess copies will be there until You manually remove those. Unfortunately I haven't currently option to test this on my lab to be sure what will happen.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If there are excess buckets, there are excess buckets 🙂

I would assume that each bucket will get frozen in their own time. So - if you have 6 copies of a bucket when you have SF=RF=3, you have 3 excess buckets, right? And while the first one gets frozen, you're left with 2 excess buckets. Then 1. Then you have no excess buckets.

At least that's how I'd expect it to behave.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

If your SF=RF=3 then you have only 3 buckets.

Based on that https://community.splunk.com/t5/Deployment-Architecture/How-do-I-manually-identify-excess-buckets-in... I read that those excess buckets are something additional over what you have defined in SF/RF. 

"Excess buckets are the result of corrective action taken by the cluster master upon peer node failure to ensure that your configured replication factor is being met in the cluster."

Unfortunately I haven't any suitable lab env where to check what this actually means. But currently I read this that there are some additional bucket which are not needed and those should be removed by "splunk remove excess-bucket" command.

It has said "In effect, a returning peer can cause the cluster to store more copies of some buckets than are needed to fulfill the replication factor and, possibly, the search factor as well. It can sometimes be useful to keep the extra copies around, as that topic explains, but you can save disk space by instead removing them." on https://docs.splunk.com/Documentation/Splunk/9.1.0/Indexer/Removeextrabucketcopies which also said quite explicitly that there are more buckets that should be to fulfil SF/RF.  I haven't found any information what will happen those when "real" buckets will be frozen. I still expecting that those are "just sitting there" until those are manually cleaned.

@gjanders may by you have 1st hand experience about this issue?

0 Karma

gjanders
SplunkTrust
SplunkTrust

I regularly have excess buckets and I haven't tested if they would freeze as per normal.

I would assume they would as they are still tracked by the cluster manager...they are just extra buckets for the generation the manager has set, they are just considered excess to the current replication and search factor.

Otherwise it's just another bucket.

 

I even have https://ideas.splunk.com/ideas/E-I-75 open as I believe they should automatically remove themselves over time.

 

isoutamo
SplunkTrust
SplunkTrust

Hi

I haven't look it so deeply that I cannot be sure. But as a bucket can be marked as excess bucket, when it is on all nodes (fulfil SRF & SSF) and primary has changed to another node and then that bucket has frozen (at least from some nodes), in theory there could be some excess bucket which have already frozen on some or all other nodes.

r. Ismo

0 Karma

Shashwat
Explorer

Thank you.
Hope you find my further post on this, please let me know with your opinion

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...