Hi all.
I recently came in a discussion with my fellow colleages about disk usage.
Assuming we have 100GB/day on a cluster with RF 15 and SF 11, the amount of disk (total per day) would be:
Raw data: 15RF * 15% of 100Gb (15Gb) = 225Gb
Tsidx: 11SF * 35% of 100Gb (35Gb) = 385 Gb
TOTAL: 610Gb
If I want to reduce disk consumption as much as I could. What would you reduce, RF or SF? Please provide some explanation, as my initial answer is reduce SF but my colleages stand about reducing RF.
Thanks in advance.
Hi @MLGSPLUNK,
Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality.
Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.
If the cluster is not multisite with several sites, there may be no need too high SF.
Hi @MLGSPLUNK,
Your sample calculation proves that decreasing SF will help more on reducing disk consumption. Replication Factor will only replicate compressed raw data. Search Factor will add more overhead by creating index files that can be much higher than %35 of raw data depending on the cardinality.
Also keep in mind that, event if you loose Indexer that holds tsidx index files, another index can create them from raw data in case Cluster needs, it only takes some time.
If the cluster is not multisite with several sites, there may be no need too high SF.
Hi @scelikok
That is my guessing, and according to the docs and training, the formula stands that decreasing SF would lower the most disk consumptions.
The RF and SF so high was only used to prove the decreasing in numbers if we decrease the SF. So according to my formula, docs and reasoning, decreasing SF would decrease maximum disk usage.
Thanks.
@isoutamo not a real config, just wanted to show the big difference if we reduce one factor or the other.
Thanks for the heads-up!