Deployment Architecture
Highlighted

Splunk Storage Sizing Guidelines and calculations

Path Finder

Hi Team,
I have doubt with Splunk Storage Sizing apps
https://splunk-sizing.appspot.com/#ar=0&c=1&cf=0.15&cr=180&hwr=7&i=5&rf=1&sf=1&st=v&v=100

I am keeping it very simple, lets suppose need to ingest 100GB/Day
Data Rentntion is for 6 Months
Number of indexers in cluster 5
Search Factor 1
Replication Factor 1

As per Splunk Storage Sizing
Raw Compression Factor - Typically the compressed raw data file is 15% of the incoming pre-indexed data. The number of unique terms affects this value.
Metadata Size Factor - Typically metadata is 35% of raw data, The type of data and index files will affects this value

So as per the above calculation 15% of 100GB = 15GB
and 35% of 15GB = 5.25FB
which is 20.25GB for 5 Servers/Day and 4.05GB/Day for 1 server

So if we are considering retention period of 180 Days then 4.05*180 = 729GB/Server for Six months and 3645GB (3.6TB) for 5 servers

But as per the Splunk Storage Sizing
You need to have 1.8TB/server and 9.1TB for 5 servers.

My calcualtion and Splunk Storage Sizing calculation doesnt match at all.
Splunk Storage sizing calculation goes with 50% of preindexed data completely, where as per their guidelines metadata is 35% of raw data not actual incoming data.

Please let me know what I am missing.

0 Karma
Highlighted

Re: Splunk Storage Sizing Guidelines and calculations

SplunkTrust
SplunkTrust

The 15% and 35% calculations should be made on the same raw daily ingestion value. An easier method is to take %50 of the daily ingestion value as the daily storage requirement.

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Highlighted

Re: Splunk Storage Sizing Guidelines and calculations

Path Finder

Thank you so much Rich for your reply.
But then again I just came across this document which says "Typically, the compressed rawdata file is 10% the size of the incoming, pre-indexed raw data. The associated index files range in size from approximately 10% to 110% of the rawdata file. The number of unique terms in the data affect this value."

https://docs.splunk.com/Documentation/Splunk/7.2.6/Capacity/Estimateyourstoragerequirements

Can you please brief me what does it mean exactly which is in documents?
Because as per the documents i guess below will be the calculation
Actual Data = Raw Data + Index Files
100 GB = 10GB (10% of actual data) + ((1GB to 11GB) {10% to 110% of rawdata}))
= 11 GB to 21GB
= 25 GB @round off figure for 5 servers(Considering higher value with round off figure)
= 5 GB / server
5*180 = 900 GB / Server and 25 * 180 = 4.5 TB for 5 servers

I might be wrong but just could not match the documentation part

Because as per the Splunk Storage Sizing, size of index files(which are having only pointers for your indexed data i believe) is more than size of your actual indexed data(rawdata)
Isn't it sounds something unusual? I guess indexed data should be bigger than index files.

0 Karma