Monitoring Splunk

Trying to understand compression - given % compression of X volume data, how much on disk is required?

the_wolverine
Champion

I'm trying to understand the compression numbers provided by Splunk. Given a compression of, say, 40%, on a volume of 100 GB. What does that translate to on disk for storage purposes?

Is it 60 GB (100 GB x (100-40)) OR
is it 40% x 100 GB = 40 GB?
Or, something else?

alt text

0 Karma
1 Solution

_d_
Splunk Employee
Splunk Employee

Not clear what number's you're referring to but compression ratio should always be calculated as:

Compression Ratio = (Uncompressed Size)/(Compressed Size)

Also, compression ratio expressed in percent does not make much sense. Storage savings on the other hand are a different story.

Example:

Uncompressed = 100GB, Compressed = 40GB

compression ratio = 100/40 = 2.5 OR alternatively noted as 2.5:1

savings % = 100 * (1 - 40GB/100GB) = 60%

View solution in original post

hakeniz
Loves-to-Learn Lots

Splunk Document Link Changed. Follow below Link for your reference : 

https://docs.splunk.com/Documentation/Splunk/latest/Capacity/Estimateyourstoragerequirements

0 Karma

jlaw
Splunk Employee
Splunk Employee
0 Karma

the_wolverine
Champion

Also, is this a reliable number to use for storage calculation? What I mean is, does this diskTotalinMB include all associated files that require space for that index? Is this the "du" for the entire index (hot and cold) and all files within?

0 Karma

_d_
Splunk Employee
Splunk Employee

Not clear what number's you're referring to but compression ratio should always be calculated as:

Compression Ratio = (Uncompressed Size)/(Compressed Size)

Also, compression ratio expressed in percent does not make much sense. Storage savings on the other hand are a different story.

Example:

Uncompressed = 100GB, Compressed = 40GB

compression ratio = 100/40 = 2.5 OR alternatively noted as 2.5:1

savings % = 100 * (1 - 40GB/100GB) = 60%

the_wolverine
Champion

That's why this is confusing -- the wording is wrong.

0 Karma

_d_
Splunk Employee
Splunk Employee

...char limit...A more exact statement would be your former one: "100GB of raw data indexed takes up 40% of its original volume"

0 Karma

_d_
Splunk Employee
Splunk Employee

No. You're confusing the compressed size of something expressed as a percentage of the original with compression ratio of a certain mechanism. Mathematically and technically speaking, the compression ratio is never expressed or noted in percent. Instead notations such as x/y, x:y or alike are used.
"100GB indexed at 40% compression rate = 40GB on disk." This is a wrongly worded statement about percentages and rates. That's like saying: a $100 pair of jeans sold at 25% discount will cost $25.

sowings
Splunk Employee
Splunk Employee

Compression expressed as percent absolutely makes sense! "The data takes up X% of its original volume." 100GB indexed at 40% compression rate = 40GB on disk.

0 Karma

the_wolverine
Champion

diskTotalinMB = rawTotalinMB * (compression * 100)

0 Karma

the_wolverine
Champion

I checked the stats and the numbers work out to your answer.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...