Installation

Index cluster - what would cause the greatest reduction of disk space: RF or SF?

MLGSPLUNK
Path Finder

Hi all.

I recently came in a discussion with my fellow colleages about disk usage.

Assuming we have 100GB/day on a cluster with RF 15 and SF 11, the amount of disk (total per day) would be:

Raw data: 15RF * 15% of 100Gb (15Gb) = 225Gb

Tsidx: 11SF * 35% of 100Gb (35Gb) = 385 Gb

TOTAL: 610Gb

If I want to reduce disk consumption as much as I could. What would you reduce, RF or SF? Please provide some explanation, as my initial answer is reduce SF but my colleages stand about reducing RF.

 

Thanks in advance. 

0 Karma
1 Solution

scelikok
SplunkTrust
SplunkTrust

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

View solution in original post

scelikok
SplunkTrust
SplunkTrust

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

MLGSPLUNK
Path Finder

Hi @scelikok 

 

That is my guessing, and according to the docs and training, the formula stands that decreasing SF would lower the most disk consumptions.

 

The RF and SF so high was only used to prove the decreasing in numbers if we decrease the SF. So according to my formula, docs and reasoning, decreasing SF would decrease maximum disk usage.

 

Thanks.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Hi
You probably have some reason to use so high values for those? Usually those are like SF 2 and RF 3. That should be enough when your servers, storage and network are reliable.
R. Ismo

MLGSPLUNK
Path Finder

@isoutamo not a real config, just wanted to show the big difference if we reduce one factor or the other. 

Thanks for the heads-up!

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...