Installation

Index cluster - what would cause the greatest reduction of disk space: RF or SF?

MLGSPLUNK
Path Finder

Hi all.

I recently came in a discussion with my fellow colleages about disk usage.

Assuming we have 100GB/day on a cluster with RF 15 and SF 11, the amount of disk (total per day) would be:

Raw data: 15RF * 15% of 100Gb (15Gb) = 225Gb

Tsidx: 11SF * 35% of 100Gb (35Gb) = 385 Gb

TOTAL: 610Gb

If I want to reduce disk consumption as much as I could. What would you reduce, RF or SF? Please provide some explanation, as my initial answer is reduce SF but my colleages stand about reducing RF.

 

Thanks in advance. 

0 Karma
1 Solution

scelikok
SplunkTrust
SplunkTrust

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote is appreciated.

View solution in original post

scelikok
SplunkTrust
SplunkTrust

Hi @MLGSPLUNK,

Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality. 

Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.

If the cluster is not multisite with several sites, there may be no need too high SF.

If this reply helps you an upvote is appreciated.

MLGSPLUNK
Path Finder

Hi @scelikok 

 

That is my guessing, and according to the docs and training, the formula stands that decreasing SF would lower the most disk consumptions.

 

The RF and SF so high was only used to prove the decreasing in numbers if we decrease the SF. So according to my formula, docs and reasoning, decreasing SF would decrease maximum disk usage.

 

Thanks.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Hi
You probably have some reason to use so high values for those? Usually those are like SF 2 and RF 3. That should be enough when your servers, storage and network are reliable.
R. Ismo

MLGSPLUNK
Path Finder

@isoutamo not a real config, just wanted to show the big difference if we reduce one factor or the other. 

Thanks for the heads-up!

0 Karma
Get Updates on the Splunk Community!

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

Getting Started with AIOps: Event Correlation Basics and Alert Storm Detection in ...

Getting Started with AIOps:Event Correlation Basics and Alert Storm Detection in Splunk IT Service ...

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...