Installation

Index cluster - what would cause the greatest reduction of disk space: RF or SF?

MLGSPLUNK
Path Finder

Hi all.

I recently came in a discussion with my fellow colleages about disk usage.

Assuming we have 100GB/day on a cluster with RF 15 and SF 11, the amount of disk (total per day) would be:

Raw data: 15RF * 15% of 100Gb (15Gb) = 225Gb

Tsidx: 11SF * 35% of 100Gb (35Gb) = 385 Gb

TOTAL: 610Gb

If I want to reduce disk consumption as much as I could. What would you reduce, RF or SF? Please provide some explanation, as my initial answer is reduce SF but my colleages stand about reducing RF.

 

Thanks in advance. 

0 Karma
1 Solution

scelikok
Champion

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote is appreciated.

View solution in original post

scelikok
Champion

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote is appreciated.

View solution in original post

MLGSPLUNK
Path Finder

Hi @scelikok 

 

That is my guessing, and according to the docs and training, the formula stands that decreasing SF would lower the most disk consumptions.

 

The RF and SF so high was only used to prove the decreasing in numbers if we decrease the SF. So according to my formula, docs and reasoning, decreasing SF would decrease maximum disk usage.

 

Thanks.

0 Karma

soutamo
SplunkTrust
SplunkTrust
Hi
You probably have some reason to use so high values for those? Usually those are like SF 2 and RF 3. That should be enough when your servers, storage and network are reliable.
R. Ismo

MLGSPLUNK
Path Finder

@soutamo not a real config, just wanted to show the big difference if we reduce one factor or the other. 

Thanks for the heads-up!

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.