Hi fellow Splunkers!
I'm currently trying to figure out how much data storage my deployment of splunk would need if I would index up to 10 GB of data per day.
What Splunk thinks about it:
Typically, the compressed rawdata file is 10% the size of the incoming, pre-indexed raw data. The associated index files range in size from approximately 10% to 110% of the rawdata file. The number of unique terms in the data affect this value.
http://docs.splunk.com/Documentation/Splunk/6.3.3/Capacity/Estimateyourstoragerequirements
What I think about it:
I have a log volume of 10 GB per day.
This would be an estimated rawdata size of... 10 GB.
This would be an estimated compressed(10%) rawdata size of 1 GB.
The maximal possible compression(10%) of indexdata could be 1 GB.
The minimal possible compression(110%) of indexdata would be 11 GB.
As a result I would have to make room for storage between at least 2 GB and max 12 GB for every day I want to store data.
Am I right?
What you think about it:
...
The general rule of thumb calculation is:
raw_daily_bandwidth * days-to-retain-data * index-replication-factor / 2 (includes reduction due to compression and bloating due to indexing overhead ASSUMING NOT USING `indexed_extractions`)
One key question here is how long do you want to keep the data for?
Other questions worthy of consideration are:
The splunk-sizing web app will help you get most of the way there and allows you to specify storage contingency.
Cheers, Greg.
Thanks to you, too! 🙂
What would change if I'm planning to use data models?
Do I need to save even more data?
Only if you accelerate the data models.
Wow, this is an amazing tool. Thank you! 🙂