Installation

Index cluster - what would cause the greatest reduction of disk space: RF or SF?

MLGSPLUNK
Path Finder

Hi all.

I recently came in a discussion with my fellow colleages about disk usage.

Assuming we have 100GB/day on a cluster with RF 15 and SF 11, the amount of disk (total per day) would be:

Raw data: 15RF * 15% of 100Gb (15Gb) = 225Gb

Tsidx: 11SF * 35% of 100Gb (35Gb) = 385 Gb

TOTAL: 610Gb

If I want to reduce disk consumption as much as I could. What would you reduce, RF or SF? Please provide some explanation, as my initial answer is reduce SF but my colleages stand about reducing RF.

 

Thanks in advance. 

0 Karma
1 Solution

scelikok
SplunkTrust
SplunkTrust

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

View solution in original post

scelikok
SplunkTrust
SplunkTrust

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

MLGSPLUNK
Path Finder

Hi @scelikok 

 

That is my guessing, and according to the docs and training, the formula stands that decreasing SF would lower the most disk consumptions.

 

The RF and SF so high was only used to prove the decreasing in numbers if we decrease the SF. So according to my formula, docs and reasoning, decreasing SF would decrease maximum disk usage.

 

Thanks.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Hi
You probably have some reason to use so high values for those? Usually those are like SF 2 and RF 3. That should be enough when your servers, storage and network are reliable.
R. Ismo

MLGSPLUNK
Path Finder

@isoutamo not a real config, just wanted to show the big difference if we reduce one factor or the other. 

Thanks for the heads-up!

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...