Hi,
I have been looking for how to calculate the range of index file in Splunk. I wrote the math according to the following description. Could anyone check it?
Estimate your storage requirements
The compressed rawdata file is approximately 10% the size of the incoming, pre-indexed raw data.
The associated index files range in size from approximately 10% to 110% of the rawdata file.
Raw data size: 9TB
"rawdata file size": 9TB x 10%
Minimum index size: (9TB x 10%) + ((9TB x 10%) x 10%)
Maximum index size: (9TB x 10%) + ((9TB x 10%) x 110%)
Thank you all in advance.
For ongoing knowledge of your indexers, I highly recommend Sanford Owing's Fire Brigade Fire Brigade App on Splunkbase. It will give you a ton of information on the size of indexes and individual buckets for existing indexes, which lets you plan for future growth.
(Edited to link to Fire Brigade Version 2 for Splunk 5 & 6)
You've got one "x 10%" too many. The index size is based on the raw data, not the compressed raw data.
To get a more accurate reading on your data you could take a 10GB sample and store it in a temporary index, take that size on disk as a baseline.
As a real-life example, JSON data from Twitter is compressible to about 15% and yields indexes about 60% of the raw data size - in total you'd need about 75% of the raw size on disk.
martin_mueller,
Thanks for your post!
To observe a data sample's size is really a good idea, but It's hard to get it right now. So I would like to estimate the maximum size of the index file.