Dashboards & Visualizations

Actual disk size

sheltomt
Path Finder

A few months back I was doing a dashboard and looking at various disk usage charts, one being Overall Disk Usage

As I was doing research, I came across several posts that mentioned a rule of thumb of divide by two. Utilizing this rule, we were able to successfully pull the proper numbers.

We are just now having a discussion on a conference call, and the divide by two rule came up. I cannot for the life of me google the right phrase to find out where this came from, and why.

Does anyone have any insight?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

When your data is indexed by Splunk, it will compress the files depending on how many unique key-value pairs you have. The more unique key-value pairs you have means a larger tsidx file. A general rule is the tsidx file will compress to around 35% of the original raw data size while the journal.gz (your raw data) will take roughly 15%. So if you add these up, you have your 50%

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Reducing size by 50% is a good first approximation, but the actual performance is going to be completely data dependent. Here are some useful references:

1) This answer says "I typically see about 40% to 50% compression"... https://answers.splunk.com/answers/52075/compression-rate-for-indexes-hot-warm-cold-frozen.html

2) That's probably the underlying assumption behind the 1/2 in this answer...
https://answers.splunk.com/answers/173541/is-there-a-way-to-determine-how-much-disk-space-my.html

3) One of the answers in this one has some useful breakdown information. "It´s usually about half of the original size, so for your question 100GB would need about 50gb, from those around 10gb would be the original logs zipped, and 40gb the indexes."
https://answers.splunk.com/answers/106904/trying-to-understand-compression-given-compression-of-x-vo...

4) However, there are other answers that, in specific situations, indicate that more disk space is needed to store index+data than was in the original data. (fully indexing csv input for example).

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...