Deployment Architecture

Splunk Storage Sizing Guidelines and calculations

Ajinkya1992
Path Finder

Hi Team,
I have doubt with Splunk Storage Sizing apps
https://splunk-sizing.appspot.com/#ar=0&c=1&cf=0.15&cr=180&hwr=7&i=5&rf=1&sf=1&st=v&v=100

I am keeping it very simple, lets suppose need to ingest 100GB/Day
Data Rentntion is for 6 Months
Number of indexers in cluster 5
Search Factor 1
Replication Factor 1

As per Splunk Storage Sizing
Raw Compression Factor - Typically the compressed raw data file is 15% of the incoming pre-indexed data. The number of unique terms affects this value.
Metadata Size Factor - Typically metadata is 35% of raw data, The type of data and index files will affects this value

So as per the above calculation 15% of 100GB = 15GB
and 35% of 15GB = 5.25FB
which is 20.25GB for 5 Servers/Day and 4.05GB/Day for 1 server

So if we are considering retention period of 180 Days then 4.05*180 = 729GB/Server for Six months and 3645GB (3.6TB) for 5 servers

But as per the Splunk Storage Sizing
You need to have 1.8TB/server and 9.1TB for 5 servers.

My calcualtion and Splunk Storage Sizing calculation doesnt match at all.
Splunk Storage sizing calculation goes with 50% of preindexed data completely, where as per their guidelines metadata is 35% of raw data not actual incoming data.

Please let me know what I am missing.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The 15% and 35% calculations should be made on the same raw daily ingestion value. An easier method is to take %50 of the daily ingestion value as the daily storage requirement.

---
If this reply helps you, Karma would be appreciated.
0 Karma

Ajinkya1992
Path Finder

Thank you so much Rich for your reply.
But then again I just came across this document which says "Typically, the compressed rawdata file is 10% the size of the incoming, pre-indexed raw data. The associated index files range in size from approximately 10% to 110% of the rawdata file. The number of unique terms in the data affect this value."

https://docs.splunk.com/Documentation/Splunk/7.2.6/Capacity/Estimateyourstoragerequirements

Can you please brief me what does it mean exactly which is in documents?
Because as per the documents i guess below will be the calculation
Actual Data = Raw Data + Index Files
100 GB = 10GB (10% of actual data) + ((1GB to 11GB) {10% to 110% of rawdata}))
= 11 GB to 21GB
= 25 GB @round off figure for 5 servers(Considering higher value with round off figure)
= 5 GB / server
5*180 = 900 GB / Server and 25 * 180 = 4.5 TB for 5 servers

I might be wrong but just could not match the documentation part

Because as per the Splunk Storage Sizing, size of index files(which are having only pointers for your indexed data i believe) is more than size of your actual indexed data(rawdata)
Isn't it sounds something unusual? I guess indexed data should be bigger than index files.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...